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IDENTIFICATION OF POLYCYSTIC KIDNEY 
DISEASE GENE, DIAGNOSTICS AND TREATMENT 

This is a continuation-in-part of U.S. Serial No. 
08/253,524, filed, June 3, 1994, which?* is incorporated by 
reference herein in its entirety. 

1. INTRODUCTION 
The present invention relates to the identification of 
the gene, referred to as the PKD1 gene, mutations in which 
are responsible for the vast majority of cases involving 
autosomal dominant polycystic kidney disease (ADPKD) . The 
PKD1 gene, including the complete nucleotide sequence of the 
gene's coding region are presented. Further, the complete 
PKD1 gene product amino acid sequence and protein structure 
and antibodies directed against the PKD1 gene product are 
also presented. Additionally, the present invention relates 
to therapeutic methods and compositions for the treatment of 
ADPKD symptoms. Methods are also presented for the 
identification of compounds that modulate the level of 
expression of the PKD1 gene or the activity of mutant PKD1 
gene product, and the evaluation and use of such compounds in 
the treatment of ADPKD symptoms. Still further, the present 
invention relates to prognostic and diagnostic, including 
prenatal, methods and compositions for the detection of 
mutant PKD1 allel es and/or abnormal levels of PKD1 gene 
product or gene product activity.' 

2 . BACKGROUND OF THE INVENTION 
Autosomal dominant polycystic kidney disease (ADPKD) is 
among the most prevalent dominant human disorders, affecting 
between 1 in 1,000 and 1 in 3,000 individuals worldwide 
(Dalgaard, O.Z., 1957, Acta. Med. Scand. 158:1-251). The 
major manifestation of the disorder is the progressive cystic 
dilation of renal tubules (Gabow, P\A., 1990, Am. J. Kidney 
Dis. 16:403-413), leading to renal failure in half of 
affected individuals by age 50. 
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ADPKD-associated renal cysts may enlarge to contain 
several liters of fluid and the kidneys usually enlarge 
progressively causing pain. Other abnormalities such as 
pain, hematuria, renal and urinary infection, renal tumors, 
5 salt and water imbalance and hypertension frequently result 
from the renal defect. Cystic abnormalities in other organs, 
including the liver, pancreas, spleen and ovaries are 
commonly found in ADPKD. Massive liver enlargement 
.occasionally causes portal hypertension and hepatic failure. 

10 Cardiac valve abnormalities and an increased frequency of 

subarachnoid and other intracranial hemorrhage have also been 
observed in ADPKD. Progressive renal failure causes death in 
many ADPKD patients and dialysis and transplantation are 
frequently required to maintain life in . these, patients . 

15 Although end- stage renal failure usually supervenes in middle 
age (ADPKD is sometimes called adult polycystic kidney 
disease) , children may occasionally have severe renal cystic 
disease . 

Although studies of kidneys from ADPKD patients have 

20 demonstrated a number of different biochemical, structural 
and physiological abnormalities, the disorder's underlying 
causative biochemical defect remains unknown. Biochemical 
abnormalities which have been observed have involved protein- 
sorting, the distribution of cell membrane markers within 

25 renal epithelial cells, extracellular matrix, ion transport, 
epithelial cell turnover, and epithelial cell proliferation. 
The most carefully documented of these findings are 
abnormalities in the composition of tubular epithelial cells, 
and a reversal of the normal polarized distribution of cell 

30 membrane proteins, such as the Na + /K + ATPase (Carone, F.A. et 
al., 1994, Lab. Inv. 70:437-448.). 

As the name implies, ADPKD is inherited as an autosomal 
dominant disorder. Three distinct loci have been shown to 
cause phenotypically indistinct forms of the disease, with 

35 greater than 85-90% of disease incidence being due to 

mutations which map to the short arm of chromosome 16, as 
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discussed below. Despite intensive investigation, the 
molecular defect responsible for ADPKD is not known* 

In 1985 Reeders et al . (Reeders et al . , Nature 317:542, 
1985) carried out genetic linkage studies of a large number 
5 of ADPKD families and demonstrated that a gene on the short 
arm of chromosome 16 was mutated in most cases of ADPKD. 
This gene has been designated PKDl by the Nomenclature 
Committee of the Human Gene Mapping Workshop and the Genome 
Data Base of the Welch library, John Hopkins University. 

10 Further linkage studies have identified a set of genetic 
markers that flank the gene-rich region containing the PKDl 
gene .{Reeders et al . , 1988, Genomics 3:150; Somlo et . al . , 
1992, Genomics 13:152; Breuning et al . , 19 90, J. Med. Genet. 
27:603; Germino et al . , 1990, Am. J. Hum. Genet. 46:925). 

15 These markers have been mapped by a variety of physical 
mapping techniques including fluorescent in situ 
hybridization and pulsed-field gel electrophoresis (Gillespie 
et al., 1990, Nucleic Acids Research 1.8:7071). It has been 
shown that the closest distal genetic marker (D16S259; on the 

20 telomeric side of the PKDl locus) lies within 75 0 kb of the 
closest proximal genetic marker (D16S25; on the centromeric 
side of the PKDl locus). The interval between the genetic 
markers has been cloned in a series of overlapping cosmid and 
bacteriophage genomic clones (Germino .et al . , 1992, Genomics 

25 13:144), which contain the entire PKDl interval, with the 

exception of two gaps of less than 10 kb and less than 50 kb. 
Restriction mapping of these clones has confirmed that the 
interval between the flanking genetic markers is 750 kb. 

While genetic mapping studies such as these have begun 

30 to narrow the region within the human genome in which the 
gene responsible for ADPKD lies, there exist an estimated 
twenty or more genes within this 750 kb interval. Given the 
prevalence and severity of ADPKD, however, it is of great 
importance to eludicate which, if any, of these postulated 

3 5 genes corresponds to PKDl. 
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3. SUMMARY OF THE INVENTION 
The present invention relates to methods and 
compositions for the diagnosis and treatment of autosomal 
dominant polycystic kidney disease (ADPKD) . Specifically, a 
5 novel gene, referred to as the PKD1 gene, is described in 
Section 5.1. Mutations within the PKD1 gene are responsible 
for approximately 90% cases of ADPKD. Additionally, the PKD1 
gene product , including the nucleotide sequence of the 
complete coding region is described in Section 5.2. 
10 Antibodies directed against the PKD1 gene .product are 
described in Section 5,3. 

Further, the present invention relates to therapeutic 
methods and compositions for the amelioration of ADPKD 
symptoms. These therapeutic techniques are described in 
15 Sections 5.9 and 5.10. Methods are additionally presented 
for the identification of compounds that modulate the level 
of expression of the PKD1 gene or the activity of PKD1 mutant 
gene products, and the evaluation and use of such compounds 
as therapeutic ADPKD treatments. Such methods are described 

2 0 in Section 5.8. 

Still further, the present invention relates to 
prognostic and diagnostic, including prenatal, methods and 
compositions whereby the PKD1 gene and/or gene product can be 
used to identify individuals carrying mutant PKD1 alleles, 
25 exhibiting an abnormal level of PKD1 gene product or gene 
product activity. Additionally, the present invention 
describes methods which diagnose subjects exhibiting ADPKD 
symptoms . Such techniques are described in Section 5 . 12 . 
Additionally, the present invention relating to the use of 

3 0' PKD1 animal knockout screening assays for the identification 

of compounds useful for the amelioration of ADPKD symptoms. 

The coding region of the PKDl gene is complex and 
extensive, having a size of approximately 6 0 kb and 
containing a total of 46 exons, the sequence of which, until 
35 now, has been difficult to obtain for a number of reasons. 
First, the majority (approximately the first two thirds) of 
the PKDl gene is duplicated several times in a transcribed 
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fashion elsewhere in the genome, thus making it very 
difficult to distinguish authentic PKD1 sequence from PKDl- 
like sequence. Further, the PKD1 gene contains extensive 
repeated regions of high GC content which are not only 
5 difficult to sequence accurately, but 7* additionally, make the 
alignment of PKD1 nucleotide sequence extremely difficult. 
Still further, the PKD1 gene encodes a large transcript of 
approximately 14.5 kb in length, and evidence exists that 
there are alternatively spliced forms of the gene. Thus, the 

10 size of the PKD1 gene, the size and complexity of PKD1 

transcript, coupled with the above-described PKD1 features 
made the successful sequencing of the gene and its cDNA very 
difficult. As described in Sections 5.1.2 and in the Example 
presented in Section 10, below, however, the obstacles to 

15 sequencing the PKD1 gene have now, for the first time, been 
overcome. 

The PKD1 transcript, which is approximately 14.5 kb in 
length, encodes a PKD1 gene product with a derived amino acid 
sequence of 43 04 amino acid residues. This PKD1 gene product 
2 0 contains at least five distinct peptide domains which are 
likely to be involved in protein-protein and/or protein 
carbohydrate interactions. Further, this PKD1 gene product 
shares amino acid sequence similarity with a number of 
extracellular matrix proteins. These features of the PKD1 

2 5 gene product indicate that ADPKD is caused by a biochemical 

defect involving extracellular signalling and/or 
extracellular matrix assembly, arfd suggests therapeutic 
strategies whereby ADPKD can be treated and/or whereby ADPKD 
symptoms can be ameliorated. 

3 0 The Examples described in Section 6 through 11, below, 

demonstrate the successful identification and 

characterization of the PKD1 gene and gene product, including 
the complete nucleotide sequence of the PKD1 coding region, 
the complete amino acid sequence, and the elucidation of the 
3 5 protein structure of the PKD1 gene product. Further, a 
ADPKD-causing mutation is identified and described. 
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4 . DESCRIPTION OF THE FIGURES 
FIG. 1* A map of the PKD1 interval showing the cosmids 
and bacteriophage clones covering the region (Taken from 
Germino et al, 1992, Genomics 13: 144 J The PKD1 region as 
5 defined by flanking markers extends from D16S259 (pGGGl) to 
D16S25, a span of approximately 750kb. Single-copy probes 
used in pulsed-f ield gel mapping of the region are shown 
above the line (pGGGl, CMM65b, etc-). C, M, P, N and B are 
sites for restriction enzymes Clal, Mlul, Pvul, NotI and 
10 BssHII, respectively. Sites that cleave in genomic DNA from 
only some tissues are shown in parenthesis. Bold bars (a-z, 
aa) represent the extents of the coding regions (see Table 
2) . Horizontal lines 1-3 8 represent cosmid and phage clones 
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FIG. 2. A map of the PKD1 regibn as defined by flanking 
markers. The region extends from D16S259 (pGGGl) to W5.2CA, 
a microsatellite repeat that lies within ALCNw5.2, a span of 
30. approximately 480kb. The labels are as for FIG. 1. 

FIG. 3A-B- Genomic DNA from 40 unrelated ADPKD patients 
was amplified by PCR for SSCP analysis. Primers F23 and R23 
(See Table 1, below) were used to amplify an exon of 2 98bp. 
35 Variant SSCP patterns were seen in two ADPKD patients under 
the following conditions. Each of the patients was 
heterozygous for the normal pattern and the variant pattern. 
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The pattern seen in these patients was not seen in normal 
individuals. Arrow indicates non-denatured DNA. 

FIG. 4. A map (not to scale), derived from the cosmid 
5 contig cGGGl, cGGGlO and cDEBll / of tKS genomic region 

containing the PKDl gene. The horizontal black bars show the 
positions of the three cosmids. The discontinuities in these 
bars indicate that the full extent of cGGGl and cDEBll are 
not shown. The map was constructed using restriction enzyme 

10 data from several enzymes. BamHI, EcoRI and Not I restriction 
sites are shown. The numbers below the horizontal line 
represent distances in kilobases between adjacent restriction 
sites. The PKDl cDNA clones are shown above as grey bars. 
These clones hybridize to the restriction fragments shown 

15 immediately below them in the genomic map. 

FIG. 5A. Structure of the PKDl gene transcript. The bar 
at the top represents the PKDl exon map. A total of 46 exons 
were identified. Below the gene transcript map are 
20 depictions of the overlapping cDNA clones, with putative 
alternatively spliced regions as indicated. 

FIG. 5B. PKDl exons. This- chart lists PKDl exon sizes 
and indicates which cDNA clones contain nucleotide sequences 
25 corresponding to sequences present within specific exons. 

FIG. 6. PKDl nucleotide and afnino acid sequences. 
Depicted herein are, top line, the nucleotide sequence of the 
entire PKDl coding region {SEQ ID NO: 1), and, bottom line, 
3 0 the PKDl derived amino acid sequence (SEQ ID NO: 2) , given in 
the one -letter amino acid code. 

FIG . 7 • The derived amino acid sequence of PKDl gene 
product (SEQ ID NO: 2) . The putative peptide domains of the 
3 5 PKDl gene product are depicted underneath the amino acid 
sequence . 
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FIG. 8* A schematic representation of the PKD1 gene 
product, with each of its putative domains illustrated. 

FIG. 9. SSCP analysis. Genomic DNA from a total of 60 
5 unrelated ADPKD patients was amplif iecT by PGR for SSCP 
analysis. Intronic primers F25 and Mill-IR (see Section 
10.1, below) were used for amplification. A variant SSCP 
pattern was seen in one individual, The amplified DNA from 
this individual was then reamplif ied with the intronic 

10 primers KG8-F31 and KG8-R35 (see Section 10.1, below). Both 
strands of the reamplied DNA were sequenced, using F25 and 
Mill-IR as sequencing primers. As discussed in Section 10.2, 
below, sequencing revealed a C to T transition which created 
a stop codon at PKD1 amino acid position 765. The pattern 

15 seen in these patients was not seen in normal individuals. 

5. DETAILED DESCRIPTION OF THE INVENTION 
Methods and compositions for the diagnosis and treatment of 
(ADPKD) are described herein. Specifically, the gene, 
20 referred to herein as the PKD1 gene, in which mutations occur 
that are responsible for the vast majority of ADPKD cases is 
described. Further, the PKD1 gene product and antibodies 
directed against the PKD1 gene- product are also presented. 
Therapeutic methods and compositions are described for the 
25 treatment and amelioration of ADPKD symptoms. Further, 

methods for the identification of compounds that modulate the 
level of expression of the PKD1 gene or the activity of 
mutant PKDl gene product, and the evaluation and use of such 
compounds in the treatment of ADPKD symptoms are also 
30 provided. 

Still further, prognostic and diagnostic methods are 
described for the detection of mutant PKDl alleles, of 
abnormal levels of PKDl gene product or of gene product 
activity. 
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5.1- THE PKD1 GENE 
The PKD1 gene, mutations in which are responsible for 
greater than 9 in 10 cases of ADPKD, is described herein. 
Specifically, the strategy followed to identify the PKD1 gene 
5 is briefly discussed, as is the strategy for obtaining the 
complete nucleotide sequence of the gene. Further, the PKD1 
nucleotide sequence and alternative splicing features are 
described. Still further, nucleic acid sequences that 
hybridize to the PKD1 gene and which may be utilized as 
10 therapeutic ADPKD treatments and/or as part of diagnostic 
methods are described. Additionally, methods for the 
production or isolation of such PKD1 nucleic acid molecules 
and PKD1 -hybridizing molecules are described. 

15 5.1.1. IDENTIFICATION OF THE PKD1 GENE 

Prior to the present invention, it had only been known that 
the physical location of the PKD1 gene within the human 
genome was somewhere within a 750 kb chromosomal region on 
the short arm of chromosome 16. As presented herein, the 

2 0 interval in which this gene lies has now been reduced until 

the specific PKD1 gene has been identified out of this large 
portion of DNA. 

Briefly, the strategy which was followed to identify the 
PKD'l gene is as described herein. First, as demonstrated in 

25 the Example presented in Section 6, below, the 750 kb PKD1 
interval was first substantially narrowed to approximately 
46 0 kb, via genetic linkage studies. Next, as shown in the 
Example presented in Section 7, below, a maximum of 27 
transcriptional units (TUs) were identified within this 

30 approximately 460 kb PKD1 interval. The total length of 
these TUs was approximately 3 00 kb. Thus, the region 
containing the PKD1 coding region was narrowed down to a 
region of approximately 300 kb. 

Next, as presented in the Example shown in Section 9, 

3 5 below, a Northern analysis was conducted with mRNA isolated 

from normal and ADPKD patient kidney tissue, in order to 
attempt to compare the pattern of ADPKD pathology to the 
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expression profile of the TUs within the PKD1 interval. One 
of the TUs, Nik9, was eliminated by such an analysis, which 
indicated undetectable expression in the kidney and liver. 
In addition, as demonstrated in the Example presented in 
5 Section 9, below, a systematic search was undertaken using 
several independent techniques, including Southern analysis 
SSCP, DGGE and direct sequencing of. coding sequences, to 
detect mutations in ADPKD patients within the TUs of the PKD1 
region. By conducting such a mutation screen, greater than 

10 80% of the combined identified coding sequences in the PKD1 
region were excluded, thus further substantially narrowing 
down the region in which the PKD1 gene could lie. The screen 
was initially performed on individual genes until virtually 
all the coding sequences were shown to be devoid of 

15 mutations. The focus on possible PKD1 candidates was further 
honed by the recognition that PKD1 demonstrated one of the 
highest new mutation rates known for human diseases. Based 
on this observation, it was hypothesized that either the PKD1 
gene contained a highly mutable site or that the gene 

20 presented a large number of potential mutation sites, each 
mutable at a regular frequency. Such a hypothesis is 
supported by the absence of substantial linkage 
disequilibrium among selected population groups. Further, 
this hypothesis predicted that if the PKD1 gene was a small 

25 transcript, it should contain a highly mutable element. 

Trinucleotide repeat expansion represent one of the major 
sources for dominant mutations such as the ADPKD- causing 
mutations which arise in the PKD1 gene. A systematic search, 
for such highly mutable trinucleotide repeats was conducted 

30 within the TUs in the remaining region wherein PKD1 could 
lie, but no such repeats were identified. 

The only other explanation for the high mutational 
prevalence is that the gene is physically large and presents 
a large target for mutations. Of the TUs, nik823 , within the 

35 potential PKD1 region that had not been excluded by other 

means, only two were of a size that could potentially support 
such a high mutation rate. As demonstrated in the Example 

- 1C - 
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presented, below, in Section 9, a search for ADPKD 
correlative mutations within one of these TUs failed to 
identify any such mutations, causing it to be excluded as a 
candidate PKD1 gene. Ultimately, as demonstrated in the 
5 Example presented in Section 10, below, .one of these 

polymorphisms has been shown to be a de novo mutation which 
is predicted to lead to the production of a truncated PKD1 
protein in the affected individual, These finding are highly 
suggestive, if not proof, that the identified gene is the 
10 PKD1 gene . 

Thus, the examples presented below in* Sections 6 through 11 
demonstrate, through a variety of techniques, the genetic and 
molecular characterization of the PKD1 region, and ultimately 
demonstrate that the PKD1 gene, dominant mutations in which 
15 cause ADPKD, has been identified. 

5.1.2- SEQUENCING OF THE PKD1 GENE 
As discussed, below, in Section 5.1.3, the nucleotide 
sequence of the entire coding region of the PKD1 gene has now 

20 successfully been isolated and sequenced. In order to 
achieve this goal, however, a number of PKDl-specif ic 
impediments had to be overcome. The strategy for obtaining 
the PKD1 gene sequence is discussed, briefly, in this 
Section. The Example presented below, in Section 11, 

25 discusses this sequencing strategy in more detail. 

First, the PKD1 gene is very large, (approximately 60 
kb) , as is the PKD1 transcript, being approximately 14.5 kb 
in length. In addition to this size difficulty, 
approximately two thirds of the 5' end of the gene is 

30 duplicated several times in a highly similar, transcribed 
fashion elsewhere in the human genome (Germino, G.G. et al . , 
1992, Genomics 13.: 144-151; European Chromosome 16 Tuberous 
Sclerosis Consortium, 1993, Cell 75:1305-1315). 

The near- identity of the sequence of cDNA derived from 

35 PKD1 and from the PKDl-like duplications made the likelihood 
of piecing together a full-length PKD1 transcript by merely 
screening cDNA libraries via hybridization very low. Such a 
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screening method would be as likely to identify transcripts 
originating from both the PKDl-like duplicated regions as 
from the authentic PKD1 locus. In fact, if each of the 
duplicated loci were as transcriptionally active as the 
5 auhentic PKD1 locus, the representation of authentic PKD1 
cDNA clones among the total positive clones, would be very 
low. 

Thus, a strategy was developed for obtaining the 
authentic PKD1 sequence which included, first, a plan for 

10 obtaining the highest quality of both genomic sequence 

spanning the duplicated region as well as obtaining duplicate 
coverage of cDNA sequence spanning the expected length of the 
PKD1 transcript; second, to compare the cDNA sequences to the 
genomic sequence spanning the duplicated region, thus 

15 identiying PKD1 exons; and, finally, to assemble the 

identified exons into a full-length PKD1 coding sequence . 
The isolation of both PKD1 genomic and cDNA sequence and, 
further, the aligning of such sequences, however, proved to 
be very difficult. 

20 PKD1 genomic DNA (whch totals approximately 60 kb) 

proved to be particularly difficult to characterize for a 
number of reasons. First, portions of PKD1 genomic DNA 
(specifically, regions within cosmid cGGGlO) tended to be 
preferentially subcloned. For example, screens for 

25 trinucleotide repeats in the cGGGlO cosmid identified one 
CCT-positive subclone in a Sau3A-generated library of cGGGlO 
sublcones. This region was, however, vastly underrepresented 
in both the Sau3A library ( i.e. , approximately 1 clone out of 
over 10,000) and subsequent sheared cosmid libraries (in 

3 0 which no such clones were isolated) . A plasmid sublone 

containing the region, G13, proved difficult to grow and to 
sequence. Sequence analysis of the clone revealed a highly 
monotonous series of purines (A and G) . Such sequences are 
thought to make the clone difficult to stably propagate in 

35 bacteria. Thus, in order to ascertain the level of 

representation of the cosmid, it was necessary to construct a 
detailed physical map of the cGGGlO cosmid. 
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Second, genomic sequence within the PKDl region is very 
GC-rich (approximately 70%) , and forms extensive, stable 
secondary structures. These PKDl genomic DNA features made 
the task of obtaining accurate nucleotide sequence very 
5 difficult- Several alternative sequencing conditions, 
including different polymerases, melting conditions, 
polymerization conditons and combinations thereof had to be 
utilized before such sequence was obtained. However, even 
when reliable nucleotide sequence became available, the 

10 extensive amount of repeated sequences within the genomic 

made the aligning of sequence information very difficult. It 
became necesary for accurate aligning of sequences, 
therefore, to use the fine physical map which had been 
created earlier. 

15 The sequencing of PKDl cDNA also presented a number of 

PKDl-specif ic difficulties. First, the 14 kb size of the 
transcript made it impossible to isolate a single cDNA clone 
containg the entire PKDl transcript. Overlapping partial 
cDNA clones, therefore, had to be obtained in order to piece 

20 together an entire sequence. Partial cDNA clones were 
obtained by sequencing the ends of one cDNA insert, 
synthesizing probes using this sequence, and obtaining 
overlapping cDNA clones by their hybridization to such 
probes. Second, the PKDl gene was poorly represented in 

25 renal cDNA libraries, and, in fact, its expression appeared 
to be low in a number of tissues, making the isolation of 
PKDl cDNA clones especially difficult. 

5.1.3. THE PKDl GENE 
30 Described, herein is the complete nucleotide sequence of 

the extensive PKDl gene coding region. Further, PKDl 
alternative splicing features are discussed, below. 

The coding region of the PKDl gene is complex and 
extensive, containing a total of 46 exons and producing a 
35 transcript of approximately 14 kb in length. FIG. 5A depicts 
the structure of the PKDl gene transcript. A total of 46 
exons were identified within the PKDl gene. Additionally, 
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further refer a nucleotide sequence which encodes a gene 
product of 1, 2 or 3, as described earlier in this paragraph. 

The invention also includes nucleic acid molecules, 
preferably DNA molecules, that hybridize to, and are 
5 therefore the complements of, the DNA^sequences (a) through 
(c) , in the preceding paragraph. Such hybridization 
conditions may be highly stringent .or less highly stringent, 
as described above. In instances wherein the nucleic acid 
molecules are oligonucleotides ("oligos"), highly stringent 

10 conditions may refer, e.g. , to washing in 6xSSC/0.05% sodium 
pyrophosphate at 37°C (for 14-base oligos) , 48°C (for 17-base 
oligos) , 55°C (for 20-base oligos) , and 60°C (for 23-base 
oligos) . These nucleic acid molecules may act as PKD1 
antisense molecules, useful, for example, in PKD1 gene 

15 regulation and/or as antisense primers in amplification 
reactions of PKD1 nucleic acid sequences. Further, such 
sequences may be used as part of ribozyme and/or triple helix 
sequences, also useful for PKD gene regulation. Still 
further, such molecules may be used as components of 

20 diagnostic methods whereby the level of -PKD1 transcript may 
be deduced and/or the presence of an ADPKD-causing allele may 
be detected. Further, such sequences can be used to screen 
for and identify PKD1 homologs from, for example, other 
species. 

25 The invention also encompasses (a) DNA vectors that 

contain any of the foregoing coding sequences and/or their 
complements ( i.e. , antisense) ; (b) DNA expression vectors 
that contain any of the foregoing coding sequences 
operatively associated with a regulatory element that directs 

30 the expression of the coding sequences; and (c) genetically 
engineered host cells that contain any of the foregoing 
coding sequences operatively associated "with a regulatory 
element that directs the expression of the coding sequences 
in the host cell. As used herein, regulatory elements 

35 include but are not limited to inducible and non- inducible 
promoters, enhancers, operators and other elements known to 
those skilled in the art that drive and regulate expression. 
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For example, such regulatory elements may include CMV 
immediate early gene regulatory sequences, SV4 0 early or late 
promoter sequences on adenovirus, lac system, trp system, tac 
system or the trc system sequences. The invention includes 
5 fragments of any of the DNA sequences^disclosed herein. 

In addition to the PKD1 gene sequences described above, 
homologs of the PKD1 gene of the invention, as may, for 
example be present in other, non-human species, may be 
identified and isolated by molecular biological techniques 
10 well known in the art and, for example, labelled probes of 
small. as 12 bp. Further, mutant PKD1 alleles and additional 
normal alleles of the human PKD1 gene of the invention, may 
be identified using such techniques. Still further, there 
may exist genes at other genetic loci within the human genome 
15 that encode proteins which have extensive homology to one or 
more domains of the PKD1 gene product. Such genes may also 
be identified via such techniques. 

For example, such a previously unknown PKDl-type gene 
sequence may be isolated by performing a polymerase chain 
20 reaction (PCR; the experimental embodiment set forth by 
Mullis, K.B., 1987, U.S. Patent No. 4,583,202) using two- 
degenerate oligonucleotide primer pools designed on the basis 
of amino acid sequences within, the PKD1 gene described herein 
(see, e.g. FIG. 6, SEQ ID NO: 2). The template for the 
25 reaction may be cDNA obtained by reverse transcription of 
mRNA prepared from human or non-human cell lines or tissue 
known -to express a PKD1 allele or / PKD1 homologue . The PCR 
product may be subcloned and sequenced to insure that the 
amplified sequences represent the sequences of a PKD1 or a 
30 PKD-like nucleic acid sequence. The PCR fragment may then be 
used to isolate a full length PKD1 cDNA clone by 
radioactively labeling the amplified fragment and screening a 
bacteriophage cDNA library. Alternatively, the labeled 
fragment may be used to screen a genomic library. For a 
35 review of cloning strategies which may be used, see e.g., 

Maniatis, 1989, Molecular Cloning, A Laboratory Manual, Cold 
Springs Harbor Press, N.Y.; and Ausubel et al., 1989, Current 
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Protocols in Molecular Biology, (Green Publishing Associates 
and Wiley Interscience , N.Y.). 

5.2. THE PKD1 GENS PRODUCT 
5 The PKD1 gene products of the invention include the PKD1 

gene product encoded by the PKD1 nucleotide sequence depicted 
in FIG. 6 (SEQ ID NO: 2) . The PKD1 gene product shown in 
FIG, 6 is a protein of 4304 amino acid residues, with a 
predicted mass of approximately 467 ■ kilodaltons . This PKD1 

10 gene product contains as least five distinct peptide domains 
which are likely to be involved in protein-protein and/or 
protein -carbohydrate interactions. Further, this PKDl gene 
product shares amino acid sequence similarity with a number 
of extracellular matrix proteins. (See FIGS * 7 and 8, which 

15 list the PKD1 gene product domains.) The PKD1 gene product 
domains are more fully described below, in the Example 
presented in Section 10. 

In addition, PKD1 gene products that represent 
functionally equivalent gene products are within the scope of 

20 the invention. "Functionally equivalent" as used herein is 
as defined in Section 5.1, above. Such an equivalent PKD1 
gene product may contain deletions, additions or 
substitutions of amino acid residues within the PKD1 sequence 
encoded by the PKD1 gene sequences described, above, in 

25 Section 5,1.3, but which result in a silent change -thus 

producing a functionally equivalent PKDl protein. -Such amino 
acid substitutions may be made on the basis of similarity in 
polarity, charge, solubility, hydrophobic ity 7 hydrophilicity , 
and/or the amphipatic nature of the residues involved. For 

3 0 example, negatively charged amino acids include aspartic acid 
and glutamic acid; positively charged amino acids include 
lysine and arginine; amino acids with uncharged polar head 
groups having similar hydrophilicity values include the 
following: leucine, isoleucine, valine, glycine, analine, 

3 5 asparagine, glutamine, serine, threonine, phenylalanine and 
tyrosine. As used herein, a functionally equivalent PKDl 
refers to a protein that exhibits substantially the same 
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biological activity as the PKD1 gene product encoded by the 
PKD1 gene sequences described in Section 5.1.1, above. 

PKD1 gene products and peptides substantially similar to 
the PKD1 gene product encoded by the PKD1 gene sequences 
5 described in Section 5,1, above, which^cause ADPKD symptoms 
are also intended to fall within the scope of the invention. 
Such gene products and peptides may include dominant mutant 
PKD1 gene products, or PKD1 gene products functionally 
equivalent to such mutant PKD1 gene products. By 

10 "functionally equivalent mutant PKD1 gene product" it is 
meant -PKD1- like proteins that exhibit a biological activity 
substantially similar to the activity demonstrated by 
dominant mutant PKD1 gene products . 

The PKD1 wild type or mutant protein may be purified 

15 from natural sources, as discussed in Section 5.2.1, below, 
or may, alternatively, be chemically synthesized or 
recombinantly expressed, as discussed in Section 5.2.2, 
below. 

20 5.2.1 PKD1 PROTEIN PURIFICATION METHODS 

The PKD1 protein may be substantially purified from 
natural sources ( e.g. , purified from cells) using protein 
separation techniques well known in the art. "Substantially 
purified" signifies purified away from at least about 90% (on 

25 a weight basis) , and from at least about 99% of other 

proteins, glycoproteins, and other macromolecules normally 
found in such natural sources . / 

Such purification techniques may include, but are not 
limited to ammonium sulfate precipitation, molecular sieve 

30 chromatography, and/or ion exchange chromatography. 

Alternatively, or additionally, the PKD1 gene product may be 
purified by immunoaf f inity chromatography using an 
immunoabsorbent column to which an antibody is immobilized 
which is capable of binding the PKD1 gene product . Such an 

35 antibody may be monoclonal or polyclonal in origin. If the 
PKD1 gene product is specifically glycosylated, the 
glycosylation pattern may be utilized as part of a 
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purification scheme via, for example, lectin chromatography. 

The cellular sources from which the PKD1 gene product 
may be purified may include, but are not limited to, those 
cells that are expected, by Northern and/or Western blot 
5 analysis, to express the PKD1 gene. Preferably, such 

cellular sources are renal tubular epithelial cells, bilary 
duct cells, skeletal muscle cells, whole brain cells, lung 
alveolar epithelial cell, and placental cells. 

One or more forms of the PKD1 gene product may be 

10 secreted out of the cell, i.e. , may be extracellular. Such 
extracellular forms of the PKD1 gene product may preferably 
be purified from whole tissue rather than cells, utilizing 
any of the techniques described above. Preferable tissue 
includes, but is not limited to those tissues than contain 

15 cell types such as those described above. Alternatively, 
PKD1 expressing cells such as those described above may be 
grown in cell culture, under conditions well known to those 
of skill in the art. The PKD1 gene product may then be 
purified from the cell media using any of the techniques 

20 discussed above. 

5.2.2. PKD1 PROTEIN SYNTHESIS AND EXPRESSION METHODS 

Methods for the chemical ■ synthesis .of polypeptides 
( e.g. , gene products) or fragments thereof, are well-known to 
25 those of ordinary skill in the art, e.g. , peptides can be 

synthesized by solid phase techniques, cleaved from the resin 
and purified by preparative high' performance liquid 
chromatography (see, e.g. , Creighton, 1983, Proteins: 
Structures and Molecular Principles, W.H. Freeman & Co., 
30 N.Y., pp. 50-6.0). The composition of the synthetic peptides 
may be confirmed by amino acid analysis or sequencing; e.g. , 
using the Edman degradation procedure (see e.g. , Creighton, 
1983, supra at pp. 34-49) . Thus, the PKD1 protein may be 
chemically synthesized in whole or in part . 
3 5 The PKD1 protein may additionally be produced by 

recombinant DNA technology using the PKD1 nucleotide 
sequences as described, above, in Section 5.1, coupled with 
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techniques well known in the art. Thus, methods for 
preparing the PKD1 polypeptides and peptides of the invention 
by expressing nucleic acid encoding PKD1 sequences are 
described herein. Methods which are well known to those 
5 skilled in the art can be used to construct expression 
vectors containing PKD1 protein coding sequences and 
appropriate transcriptional/translational control signals. 
These methods include, for example, in vitro recombinant DNA 
techniques, synthetic techniques and in vivo 

10 recombination/genetic recombination. See, for example, the 
techniques described in Maniatis et al., 1989, Molecular 
Cloning A Laboratory Manual, Cold Spring Harbor Laboratory, 
N.Y. and Ausubel et al . , 1989, Current Protocols in Molecular 
Biology, Greene Publishing Associates and Wiley Interscience, 

15 N.Y., both of which are incorporated by reference herein in 
their entirety. Alternatively, RNA capable of encoding PKD1 
protein sequences may be chemically synthesized using, for 
example, automated or semi-automated synthesizers. See, for 
example, the techniques described in "Oligonucleotide 

20 Synthesis 1 ' , 1984, Gait, M.J. ed. , IRL Press, Oxford, which is 
incorporated by reference herein in its entirety . 

A variety of host -expression vector systems may be 
utilized to express the PKD1 coding sequences of the 
invention. Such host -expression systems represent vehicles 

25 by which the coding sequences of interest may be produced and 
subsequently purified, but also represent cells which may, 
when transformed or transfected with the appropriate 
nucleotide coding sequences, exhibit the PKD1 protein of the 
invention in situ . These include but are not limited to 

30 microorganisms such as bacteria ( e.g. , E. coli , B. subtilis ) 
transformed with recombinant bacteriophage DNA, plasmid DNA 
or cosmid DNA expression vectors containing PKD1 protein 
coding sequences; yeast ( e.g. , Saccharomvces , Pichia) 
transformed with recombinant yeast expression vectors 

35 containing the PKD1 protein coding sequences; insect cell 
systems infected with recombinant virus expression vectors 
( e.g. , baculovirus) containing the PKDl protein coding 
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sequences; plant cell systems infected with recombinant virus 
expression vectors ( e.g. , cauliflower mosaic virus, CaMV; 
tobacco mosaic virus, TMV) or transformed with recombinant 
plasmid expression vectors ( e.g. , Ti plasmid) containing the 
5 PKDl protein coding sequences coding sequence; or mammalian 
cell systems ( e.g. , COS, CHO, BHK, 293, 3T3) harboring 
recombinant expression constructs containing promoters 
derived from the genome of mammalian cells ( e.g. , 
metallothionein promoter) or from mammalian viruses ( e.g. , 
10 the adenovirus late promoter; the vaccinia virus 7.5K 
promoter) . 

In bacterial systems, a number of expression vectors may 
be advantageously selected depending upon the use intended 
for the PKDl protein being expressed. For example, when a 

15 large quantity of such a protein is to be produced, for the 
generation of antibodies or to screen peptide libraries, for 
example, vectors which direct the expression of high levels 
of fusion protein products that are readily purified may be 
desirable. Such vectors include, but are not limited to, the 

20 E. coli expression vector pUR278 (Ruther et al . , 1983, EMBO 
J- 2:1791), in which the PKDl protein coding sequence may be 
ligated individually into the vector in frame with the lac Z 
coding region so that a fusion protein is produced; pIN 
vectors (Inouye & Inouye, 1985, Nucleic Acids Res. 13:3101- 

25 3109; Van Heeke & Schuster, 1989, J. Biol. Ghem. 264:5503- 
5509) ; and the like. pGEX vectors may also be used to 
express foreign polypeptides as fusion proteins with gluta- 
thione S-transf erase (GST) . In general, such fusion proteins 
are soluble and can easily be purified from lysed cells by 

30 adsorption to glutathione-agarose beads . followed by elution 
in the presence of free glutathione. The pGEX vectors are 
designed to include thrombin or factor Xa protease cleavage 
sites so that the cloned PKDl protein can be released from 
the GST moiety. 

35 In an insect system, Autographa calif ornica nuclear 

polyhedrosis virus (AcNPV) is used as a vector to express 
foreign genes. The virus grows in Spodoptera frugjperda 
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natural and synthetic- The efficiency of expression may be 
enhanced by the inclusion of appropriate transcription 
enhancer elements, transcription terminators, etc. (see 
Bittner et al . , 1987, Methods in Enzymol . J153: 516-544 ) . 
5 In addition, a host cell strain Slay be chosen which 

modulates the expression of the inserted sequences, or 
modifies and processes the gene product in the specific 
fashion desired. Such modifications ( e.g. , glycosylation) 
and processing ( e.g. , cleavage) of protein products may be 

10 important for the function of the protein. Different host 
cells have characteristic and specific mechanisms for the 
post-translational processing and modification of proteins. 
Appropriate cells lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign 

15 protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the 
primary transcript, glycosylation, and phosphorylation of the 
gene product may be used. Such mammalian host cells include, 
but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, 293, 

20 3T3, WI38, etc. 

For long-term, high-yield production of recombinant 
proteins, stable expression is preferred. For example, cell 
lines which stably express the PKD1 protein may be 
engineered. Rather than using expression vectors which 

25 contain viral origins of replication, host cells can be 
transformed with DNA controlled by appropriate expression 
control elements ( e.g. , promoter enhancer, sequences, 
transcription terminators, polyadenylation sites, etc.), and 
a selectable marker. Following the introduction of the 

30 foreign DNA, engineered cells may be allowed to grow for 1-2 
days in an enriched media, and then are switched to a 
selective media. The selectable marker in the recombinant 
plasmid confers resistance to the selection and allows cells 
to stably integrate the plasmid into their chromosomes and 

3 5 grow to form foci which in turn can be cloned and expanded 
into cell lines. This method may advantageously be used to 
engineer cell lines which express the PKD1 protein. Such 
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in a change of amino acid sequence. Such substitutes may be 
selected from other members of the class ( i.e. , non-polar, 
positively charged or negatively charged) to which the amino 
acid belongs; e.g. , the nonpolar (hydrophobic) amino acids 
5 include alanine, leucine, isoleucine^ valine, proline, 

phenylalanine, tryptophan, and methionine; the polar neutral 
amino acids include glycine, serine, threonine, cysteine, 
tyrosine, asparagine, and glutamine; the positively charged 
(basic) amino acids include arginine, lysine, and histidine; 

10 the negatively charged (acidic) amino acids include aspartic 
and glutamic acid. 

When used as a component in the assay systems described 
herein, the PKD1 gene product or peptide ( e.g. , gene product 
fragment) may be labeled, either directly or indirectly, to 

15 facilitate detection of a complex formed between the PKD1 
gene product and a test substance. Any of a variety of 
suitable labeling systems may be used including but not 
limited to . radioisotopes such as 125 I; enzyme labelling 
systems that generate a detectable colorimetric signal or 

2 0 light when exposed to substrate; and fluorescent labels. 

Where recombinant DNA technology is used to produce the 
PKD1 protein for the assay systems described herein, it may 
be advantageous to engineer fusion proteins that can 
facilitate labeling, immobilization . and/or detection. For 

25 example, the coding sequence of the viral or host cell 

protein can be fused to that of a heterologous protein that 
has enzyme activity or serves as , an enzyme substrate in order 
to facilitate labeling and detection. The fusion constructs 
should be designed so that the heterologous component of the 

30 fusion product does not interfere with binding of the host 
cell and viral protein. 

Indirect labeling involves the use of a third protein, 
such as a labeled antibody, which specifically binds to one 
of the binding partners, i .e. , either the PKD1 protein or its 

35 binding partner used in the assay. * Such antibodies include 
but are not limited to polyclonal, monoclonal, chimeric, 
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single chain, Fab fragments and fragments produced by an Fab 
expression library. 

5,3. ANTIBODIES REACTIVE WITH PKD1 GENE PRODUCT 
5 Described herein are methods for^the production of 

antibodies capable of specifically recognizing one or more 
PKD1 gene product epitopes. Such antibodies may include, but 
are not limited to polyclonal antibodies, monoclonal 
antibodies (mAbs) , humanized or chimeric antibodies, single 
10 chain antibodies, Fab fragments, F(ab') 2 fragments, fragments 
produced by a FAb expression library, anti- idiotypic (anti- 
Id) antibodies, and epitope-binding fragments of any of the 
above. Such antibodies may be used, for example, in the 
detection of PKD1 gene product in a biological sample, or, 
15 alternatively, as a method for the inhibition of abnormal 

PKD1 activity. Thus, such antibodies may be utilized as part 
of ADPKD treatment methods, and/or may be used as part of 
diagnostic techniques whereby patients may be tested for 
abnormal levels of PKD1 gene product, of for the presence of 
2 0 abnormal forms of the PKD1 protein. 

For the production of antibodies to PKD1, various host 
animals may be immunized by injection with PKD1 protein, or a 
portion thereof. Such host animals may include but are not 
limited to, rabbits, mice, and rats. Various adjuvants may 
25 be used to increase the immunological response, depending on 
the host species, including, but not limited to, Freund's 
(complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, 
pluronic polyols, polyanions, peptides, oil emulsions, 
3.0 keyhole limpet hemocyanin, dinitrophenol, and potentially 
useful human adjuvants such as BCG (bacille Calmette-Guerin) 
and Corvnebact er iumparvum . 

Polyclonal antibodies are heterogeneous populations of 
antibody molecules derived from the sera of animals immunized 
35 with an antigen, such as PKD1, or *an antigenic functional 
derivative thereof. For the production of polyclonal 
antibodies, host animals such as those described above, may 
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be immunized by injection with PKD1 protein supplemented with 
adjuvants as also described above. 

Monoclonal antibodies which are substantially 
homogeneous populations of antibodies to a particular 
5 antigen, may be obtained by any technique which provides for 
the production of antibody molecules by continuous cell lines 
in culture. These include, but are not limited to, the 
hybridoma technique of Kohler and Milstein (1975, Nature 
256:495-497; and U.S. Patent No. 4,376,110), the human B-cell 

10 hybridoma technique (Kosbor et al . , 1983, Immunology Today 
4:72; Cole et al . , 1983, Proc . Natl. Acad. Sci. USA" 80:2026- 
2030), and the EBV-hybridoma technique (Cole et air ; 1985, 
Monoclonal Antibodies And Cancer Therapy, Alan R. Liss, Inc., 
pp. 77-96). Such antibodies may be of any immunoglobulin 

15 class, including IgG, IgM, IgE, IgA, IgD and any subclass 
thereof. The hybridoma producing the mAb of this invention 
may be cultivated in vitro or in vivo. Production of high 
titers of mAbs in vivo makes this the presently preferred 
method of production. 

2 0 In addition, techniques developed for the production of 

"chimeric antibodies" (Morrison et al . , 1984, Proc, Natl: 
Acad, Sci., 81:6851-6855; Neuberger et al . , 1984, Nature, 
312 :604-608; Takeda et al . , 1985, Nature, 314: : 452-454 ; U.S. 
Patent No. 4,816,567, which is incorpqrted by reference 
25 herein in its entirety) by splicing the genes from "a mouse 
antibody molecule of appropriate antigen specificity together 
with genes from a human antibody 'molecule of appropriate 
biological activity can be used. A chimeric antibody is a 
molecule in which different portions are derived from 

3 0 different animal species, such as those having a murine 

variable region and a human immunoglobulin constant region. 

Alternatively, techniques described for the production 
of single chain antibodies (U.S. Patent 4,946,778; Bird, 
1988, Science 242:423-426; Huston et al , , 1988, Proc. Natl. 
35 Acad. Sci. USA 85:5879-5883; and Ward et al . , 1989, Nature 
334 : 544-546) can be adapted to produce PKD1- single chain 
antibodies. Single chain antibodies are formed by linking 
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the heavy and light chain fragment of the Fv region via an 
amino acid bridge, resulting in a single chain polypeptide. 

Further, PKD1 -humanized monoclonal antibodies may be 
produced using standard techniques (see, for example, U.S. 
5 Patent No, 5,225,539, which is incorporated herein by 
reference in its entirety) . 

Antibody fragments which recognize specific epitopes may 
be generated by known techniques. For example, such 
fragments include but are not limited to: the F(ab') 2 

10 fragments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 
generated by reducing the disulfide bridg.es of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al . , 1989, Science, 246 : 1275-1281) to 

15 allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

5.4. SCREENING ASSAYS FOR COMPOUNDS 

THAT INTERACT WITH THE PKD1 GENE PRODUCT 

2Q The following assays are designed to identify compounds 

that bind to the PKD1 gene product; other cellular proteins 
that interact with the PKD1 gene product; and compounds that 
interfere with the interaction of the PKD1 product with other 
cellular proteins. 

25 Compounds identified via assays such as those described 

herein«may be useful, for example, in elaborating the 
biological function of the PKD1 g§ne product, and for 
ameliorating ADPKD symptoms caused by mutations within the 
PKD1 gene. In instances whereby a mutation with the PKD1 

2Q. gene causes a lower level of expression, and therefore 

results in an overall lower level of PKD1 activity in a cell 
or tissue, compounds that interact with the PKD1 gene product 
may include ones which accentuate or amplify the activity of 
the bound PKD1 protein. Thus, such compounds would bring 

35 about an effective increase in the level of PKD1 activity, 
thus ameliorating ADPKD symptoms. In instances whereby 
mutations with the PKD1 gene cause aberrant PKD1 proteins to 
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be made which have a deleterious effect that leads to ADPKD, 
compounds that bind PKD1 protein may be identified that 
inhibit the activity of the bound PKD1 protein. 

This decrease in the aberrant PKD1 activity can 
5 therefore, serve to ameliorate ADPKD symptoms. Assays for 
testing the effectiveness of compounds , . identified by, for 
example, techniques such as those described in this Section 
are discussed, below, in Section 5,3 . 

10 5,5. IN VITRO SCREENING ASSAYS FOR 

COMPOUNDS THAT BIND TO THE PKD1 PROTEIN 

In vitro systems may be designed to identify compounds 

capable of binding the PKD1 gene of the invention. Such 

compounds may include, but are not limited to, peptides made 

15 of D " and /° r L- configuration amino acids (in, for example, the 
form of random peptide libraries; see Lam, K.S. et al. , 1991, 
Nature 3 54 : 82-84) , phosphopeptides (in, for, example, the form 
of random or partially degenerate, directed phosphopeptide 
libraries; see, for example, Songyang, Z. et al. , 1993, Cell 

2 0 72:767-778) , antibodies, and small or large organic or 

inorganic molecules. Compounds identified may be useful, for 
example, in modulating the activity of PKD1 proteins, 
preferably mutant PKD1 proteins, may be useful in elaborating 
the biological function of the PKD1 protein, may be utilized 

25 in screens for identifying compounds that disrupt - normal PKD1 
interactions, or may in themselves disrupt such interactions. 

The principle of the assays ysed to identify compounds 
that bind to the PKD1 protein involves preparing a reaction 
mixture of the PKD1 protein and the test compound under 

3Q conditions and for a time sufficient to allow the two 

components to interact and bind, thus forming a complex which 
can be removed and/or detected in the reaction mixture. 
These assays can be conducted in a heterogeneous or 
homogeneous format. Heterogeneous assays involve anchoring 

35 PKD1 or the test substance onto a solid phase and detecting 
PKDl/test substance complexes anchored on the solid phase at 
the end of the reaction. In homogeneous assays, the entire 
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reaction is carried out in a liquid phase . In either 
approach, the order of addition of reactants can be varied to 
obtain different information about the compounds being 
tested. 

5 In a heterogeneous assay system, -the PKD1 protein may be 

anchored onto a solid surface, and the test substance, which 
is not anchored, is labeled, either directly or indirectly. 
In practice, microtiter plates are conveniently utilized. 
The anchored component may be immobilized by non-covalent or 

10 covalent attachments. Non-covalent attachment may be 
accomplished simply by coating the solid surface with a 
solution of the protein and drying. Alternatively, an 
immobilized antibody, preferably a monoclonal antibody, 
specific for the protein may be used to anchor the protein to 

15 the solid surface. The surfaces may be prepared in advance 
and stored. 

In order to conduct the assay, the labeled component is 
added to the coated surface containing the anchored 
component. After the reaction is complete, unreacted 

20 components are removed ( e.g. , by washing) under conditions 
such that any complexes formed will remain immobilized on the 
solid surface. The detection of complexes anchored on the 
solid surface can be accomplished in a number of ways. Where 
the labeled compound is pre -labeled, the detection of label 

25 immobilized on the surface indicates that complexes were 
formed r Where the labeled component is not pre-labeled, an 
indirect label can be used to detect complexes anchored on 
the surface; e.g. , using a labeled antibody specific for the 
binding partner (the antibody, in turn, may be directly 

30. labeled or indirectly labeled with a labeled anti-Ig 
antibody) . 

Alternatively, a heterogenous reaction can be conducted 
in a liquid phase, the reaction products separated from 
unreacted components, and complexes detected; e.g . , using an 
35 immobilized antibody specific for PKD1 or the test substance 
to anchor any complexes formed in solution, and a labeled 
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antibody specific for the other binding partner to detect 
anchored complexes. 

In an alternate embodiment of the invention, a 
homogeneous assay can be used. In this approach, a preformed 
5 complex of the PKD1 protein and a known binding partner is 
prepared in which one of the components is labeled, but the 
signal generated by the label is quenched due to complex 
formation (see, e.g. , U.S. Patent No. 4,109,496 by Rubenstein 
• which utilizes this approach for immunoassays) . The addition 
10 of a test substance that competes with and displaces one of 
the binding partners from the preformed complex will result 
in the generation of a signal above background. 

5.6. . ASSAYS FOR CELLULAR PROTEINS 

15 THAT INTERACT WITH PKD1 PROTEIN 

Any method suitable for detecting protein-protein 

interactions may be employed for identifying novel PKD1- 

cellular or extracellular protein interactions. For example, 

some traditional methods which may be employed are 

2 0 co-immunoprecipitation, crosslinking and copurif ication 

through gradients or chromatographic columns. Additionally, 
methods which result in the simultaneous identification of 
the genes coding for the protein interacting with a target 
protein may be employed. These methods include, for example, 

25 probing expression libraries with labeled target protein, 

using this protein in a manner similar to antibody probing of 
Xgtll libraries. 

One such method which detects protein interactions in 
vivo , the yeast two-hybrid system, is described in detail for 
illustration only and not by way of limitation. One version 
of this system'has been described (Chieri et al . , 1991, Proc. 
Natl. Acad, Sci. USA, 88:9578-9582) and is commercially 
available from Clontech (Palo Alto, CA) . 

Briefly, utilizing such a system, plasmids are 

3 g constructed that encode two hybrid proteins: one consists of 
the DNA-binding domain of a transcription activator protein 
fused to one test protein "X" and the other consists of the 
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activator protein's activation domain fused to another test 
protein 11 Y" ♦ Thus, either "X" or "Y" in this system may be 
wild type or mutant PKD1, while the other may be a test 
protein or peptide. The plasmids are transformed into a 
5 strain of the yeast Saccharomyces cerevisiae that contains a 
reporter gene ( e.g. , lacZ ) whose regulatory region contains 
the activator's binding sites. Either hybrid protein alone 
cannot activate transcription of the reporter gene, the DNA- 
binding domain hybrid because it does not provide activation 

10 function and the activation domain hybrid because it cannot 
localize to the activator's binding sites. Interaction of 
the two proteins reconstitutes the functional activator 
protexn and results in expression of the reporter gene, which 
is detected by an assay for the reporter gene product. 

15 The two-hybrid system or related methodology can be used 

to screen activation domain libraries for proteins that 
interact with a PKD1 protein. Total genomic or cDNA 
sequences are fused to the DNA encoding an activation domain. 
This library and a plasmid encoding a hybrid of the PKD1 

20 protein fused to the DNA-binding domain are cotransf ormed 

into a yeast reporter strain, and the resulting transf ormants 
are screened for those that express the. reporter gene. These 
colonies are purified and the plasmids responsible for 
reporter gene expression are isolated. DNA sequencing is 

25 then used to identify the proteins encoded by the library 
plasmids . 

For example, and not by way of limitation, the PKD1 gene 
can be cloned into a vector such that it is translationally 
fused to the DNA encoding the DNA-binding domain of the GAL4 

3 0. protein. A cDNA library of the cell line from which proteins 
that interact with PKD1 are to be detected can be made using 
methods routinely practiced in the art. According to this 
particular system, for example, the cDNA fragments can be 
inserted into a vector such that they are translationally 

35 fused to the activation domain of GAL4 . This library can be 
co-transformed along with the PKD1-GAL4 DNA binding domain 
fusion plasmid into a yeast strain which contains a lacZ gene 
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driven by a promoter which contains GAL4 activation 
sequences. A cDNA encoded protein, fused to GAL4 activation 
domain, that interacts with PKDl will reconstitute an active 
GAL4 protein and thereby drive expression of the lacz gene. 
5 Colonies which express lac Z can be detected by their blue 
color in the presence of X-gal . The cDNA can then be 
extracted from strains derived from, these and used to produce 
and isolate the PKDl -interacting protein using techniques 
routinely practiced in the art. 

10 

5.7. ASSAYS FOR COMPOUNDS THAT INTERFERE 

WITH PKDl /CELLULAR PROTEIN INTERACTION 

The PKDl protein of the invention may, in vivo , interact 

with one or more cellular or extracellular proteins. Such 

15 cellular proteins are referred to herein as "binding 

partners". Compounds that disrupt such interactions may be 
useful in regulating the activity of the PKDl protein, 
especially mutant PKDl proteins. Such compounds may include, 
but are not limited to molecules such as antibodies, 

20 Peptides, and the like described in Section 5.2.1. above. 

In instances whereby ADPKD symptoms are caused by a 
mutation within the PKDl gene which produces PKDl gene 
products having aberrant, gain-of -function activity, 
compounds identified that disrupt such interactions may, 

25 therefore inhibit the aberrant PKDl activity. Preferably, 
compounds may be identified which disrupt the interaction of 
mutant PKDl gene products with cellular or extracellular 
proteins, but do not substantially effect the interactions of 
the normal PKDl protein. Such compounds may be identified by 

3 0 comparing the effectiveness of a compound to disrupt 

interactions in an assay containing normal PKDl protein to 
that of an assay containing mutant PKDl protein . 

The basic principle of the assay systems used to 
identify compounds that interfere with the interaction 

35 between the PKDl protein, preferably mutant PKDl protein, and 
its cellular or extracellular protein binding partner or 
partners involves preparing a reaction mixture containing the 
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PKD1 protein and the binding partner under conditions and for 
a time sufficient to allow the two proteins to interact and 
bind, thus forming a complex. In order to test a compound 
for inhibitory activity, the reaction is conducted in the 
5 presence and absence of the test compound, i .e. , the test 
compound may be initially included in the reaction mixture, 
or added at a time subsequent to the addition of PKD1 and its 
cellular or extracellular binding partner; controls are 
incubated without the test compound . or with a -placebo. The 

10 formation of any complexes between the PKD1 protein and the 
cellular or extracellular binding partner is then detected* 
The formation of a complex in the control reaction, but not 
in the reaction mixture containing the test compound 
indicates that the compound interferes with the interaction 

15 of the PKD1 protein and the interactive protein. As noted 
above, complex formation within reaction mixtures containing 
the test compound and normal PKD1 protein may also be 
compared to complex formation within reaction mixtures 
containing the test compound and mutant PKD1 protein. This 

20 comparison may be important in those cases wherein it is 

desirable to identify compounds that disrupt interactions of 
mutant but not normal PKD1 proteins. 

The assay for compounds that interfere with the 
interaction of the binding partners can be conducted in a 

25 heterogeneous or homogeneous format. Heterogeneous assays 
involve anchoring one of the binding partners onto a solid 
phase -and detecting complexes anchored on the solid phase at 
the end of the reaction. In homogeneous assays, the entire 
reaction is carried out in a liquid phase. In either 

3 0 approach, the order of addition of reactants can be varied to 
obtain different information about the compounds being 
tested. For example, test compounds that interfere with the 
interaction between the binding partners, e.g. , by 
competition, can be identified by conducting the reaction in 

35 the presence of the test substance; ' i.e. , by adding the test 
substance to the reaction mixture prior to or simultaneously 
with the PKD1 protein and interactive cellular or 
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extracellular protein. On the other hand, test compounds 
that disrupt preformed complexes, e.g. compounds with higher 
binding constants that displace one of the binding partners 
from the complex, can be tested by adding the test compound 
5 to the reaction mixture after complexes have been formed. 
The various formats are described briefly below. 

In a heterogeneous assay system, one binding partner, 
e.g. , either the PKD1 protein or the interactive cellular or 
extracellular protein, is anchored onto a solid surface, and 

10 its binding partner, which is not anchored, is labeled, 
either directly or indirectly. In practice , microtiter 
plates are conveniently utilized. The anchored species may 
be immobilized by non-covalent or covalent attachments. Non- 
covalent attachment may be accomplished simply by coating the 

15 solid surface with a solution of the protein and drying. 
Alternatively, an immobilized antibody specific for the 
protein may be used to anchor the protein to the solid 
surface. The surfaces may be prepared in advance and stored. 
In order to conduct the assay, the binding partner of 

2 0 the immobilized species is added to the coated surface with 

or without the test compound. After the reaction is 
complete, unreacted components are removed ( e.g. , by washing) 
and any complexes formed will remain immobilized on the solid 
surface. The detection of complexes anchored on the solid 

25 surface can be accomplished in a number of ways . Where the 
binding partner was pre-labeled, the detection of label 
immobilized on the surface indicates that complexes were 
formed. Where the binding partner is not pre-labeled, an 
indirect label can be used to detect complexes anchored on 

30 the surface; e.g. , using a labeled antibody specific for the 
binding partner (the antibody, in turn, may be directly 
labeled or indirectly .-Labeled with a labeled anti-Ig 
antibody) . Depending upon the order of addition of reaction 
components, test compounds which inhibit complex formation or 

3 5 which disrupt preformed complexes can be detected. 

Alternatively, the reaction can be conducted in a liquid 
phase in the presence or absence of the test compound, the 
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-reaction products separated from unreacted components, and 
complexes detected; e.g. , using an immobilized antibody 
specific for one binding partner to anchor any complexes 
formed in solution, and a labeled antibody specific for the 
5 other binding partner to detect anchored complexes* Again, 
depending upon the order of addition of reactants to the 
liquid phase, test compounds which inhibit complex or which 
disrupt preformed complexes can be identified. 

In an alternate embodiment of the invention, a 

10 homogeneous assay can be used. In this approach, a preformed 
complex of the PKD1 protein and the interactive cellular or 
extracellular protein is prepared in which one of the binding 
partners is labeled, but the signal generated by the label is 
quenched due to complex formation (see, e.g. , U.S. Patent 

15 No. 4,109,496 by Rubenstein which utilizes this approach for 
immunoassays) , The addition of a test substance that 
competes with and displaces one of the binding partners from 
the preformed complex will result in the generation of a 
signal above background. In this way, test substances which 

2 0 disrupt PKD1 protein-cellular or extracellular protein 
interaction can be identified. 

In a particular embodiment, the PKD1 protein can be 
prepared for immobilization using recombinant DNA techniques 
described in Section 5.1.2.2, supra. For example, the PKD1 

25 coding-; region can be fused to the glutathione-S- transferase 
(GST) Igene using the fusion vector pGEX-5X-l, in such a 
manner~;.that its binding activity is maintained in the 
resulting fusion protein. The interactive cellular or 
extracellular protein can be purified and used to raise a 

30 monoclonal antibody, using methods routinely practiced in the 
art and described above. This antibody can be labeled with 
the radioactive isotope 125 I, for example, by methods 
routinely practiced in the art. In a heterogeneous assay, 
e.g. , the GST-PKD1 fusion protein can be anchored to 

35 glutathione-agarose beads. The interactive cellular or 
extracellular protein can then be added in the presence or 
absence of the test compound in a manner that allows 
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interaction and binding to occur. At the end of the reaction 
period, unbound material can be washed away, and the labeled 
monoclonal antibody can be added to the system and allowed to 
bind to the complexed binding partners. The interaction 
5 between the PKD1 protein and the interactive cellular or 

extracellular protein can be detected by measuring the amount 
of radioactivity that remains associated with the 
glutathione-agarose beads. A successful inhibition of the 
interaction by the test compound will result in a decrease in 

10 measured radioactivity. 

Alternatively, the GST-PKD1 fusion protein and the 
interactive cellular or extracellular protein can be- mixed 
together in liquid in the absence of the solid glutathione- 
agarose beads. The test compound can be added either during 

15 or after the binding partners are allowed to interact. This 
mixture can then be added to the glutathione-agarose beads 
and unbound material is washed away. Again the extent of 
inhibition of the binding partner interaction can be detected 
by adding the labeled antibody and measuring the 

20 radioactivity associated with the beads. 

In another embodiment of the invent ion , these same 
techniques can be employed using peptide fragments that 
correspond to the binding domains of the PKDl protein and the 
interactive cellular or extracellular protein, respectively, 

25 in place of one or both of the full length proteins. Any 
number of methods routinely practiced in the art can be used 
to identify and isolate the protein's binding site. These 
methods include, but are not limited to, mutagenesis of one 
of the genes encoding the proteins and screening for 

3 0 disruption of binding in a co-immunoprecipitation assay. 
Compensating mutations in the PKDl gene can be selected. 
Sequence analysis of the genes encoding the respective 
proteins will reveal the mutations that correspond to the 
region of the protein involved in interactive binding* 

35 Alternatively, one protein can be anchored to a solid surface 
using methods described in this Section above, and allowed to 
interact with and bind to its labeled binding partner, which 
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has been treated with a proteolytic enzyme, such as trypsin. 
After washing, a short, labeled peptide comprising the 
binding domain may remain associated with the solid material, 
which can be isolated and identified by amino acid 
5 sequencing. Also, once the gene coding, for the for the 
cellular or extracellular protein is obtained, short gene 
segments can be engineered to express peptide fragments of 
the protein, which can then be tested for binding activity 
and purified or synthesized. 

10 For example, and not by way of limitation, PKD1 can be 

anchored to a solid material as described above in this 
section by making a GST-PKD1 fusion protein and allowing it 
to bind to glutathione agarose beads. The interactive 
cellular protein can be labeled with a radioactive isotope, 

15 such as 35 S, and cleaved with a proteolytic enzyme such as 

trypsin. Cleavage products can then be added to the anchored 
GST-PKD1 fusion protein and allowed to bind. After washing 
away unbound peptides, labeled bound material, representing 
the cellular or extracellular protein binding domain, can be 

20 eluted, purified, and analyzed for amino acid sequence by 
methods described in Section 5.1.2.2, supra. Peptides so 
identified can be produced synthetically or fused to 
appropriate facilitative proteins using recombinant DNA 
technology, as described in Section 5.1.2.2, supra. 

25 

5.8. ASSAYS FOR ADKPD- INHIBITORY ACTIVITY 
Any of the binding compounds/, including but not limited 
to, compounds such as those identified in the foregoing assay 
systems may be tested for anti-ADPKD activity. ADPKD, an 

30 autosomal dominant disorder, may involve underexpression of a 
wild-type PKD1 allele, or expression of a PKD1 gene product 
that exhibits little or no PKD1 activity. In such an 
instance, even though the PKD1 gene product is present, the 
overall level of normal PKD1 gene product present is 

35 insufficient and leads to ADPKD symptoms. As such, "anti- 
ADPKD activity", as used herein, may refer to a increase in 
the level of expression of the normal PKD1 gene product, to 
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levels wherein ADPKD symptoms are ameliorated. Additionally, 
the term may refer to an increase in the level of normal PKD1 
activity in the cell, to levels wherein ADPKD symptoms are 
ameliorated, 

5 Alternatively, ADPKD may be caused by the production of 

an aberrant mutant form of the PKD1 protein, which either 
interferes with the normal allele product or introduces a 
novel function into the cell, which then leads to the mutant 
phenotype. For example, a mutant PKD1 protein may compete 

10 with the wild type protein for the binding of a substance 
required to relay a signal inside or outside of a .cell. 
Circumstances such as these are referred to as "gain of 
function" mutations. It is possible that different 
mechanisms could be occurring in different patients which can 

15 lead to mutant phenotypic variations. 

" Ant i -ADPKD activity", as used herein, may refer to a 
decrease in the level and/or activity of such a mutant PKD1 
protein so that symptoms of PKD1 are ameliorated. 

Cell -based and animal model -based assays for the 

20 identification of compounds exhibiting anti-ADPKD activity 
are described below. 

5.8.1. CELL BASED ASSAYS 
Cells that contain and express mutant PKD1 gene 

25 sequences which encode mutant PKD1 protein, and thus exhibit 
cellular phenotypes associated with ADPKD, may be utilized to 
identify compounds that possess anti-ADPKD activity. Such 
cells may include cell lines consisting of naturally 
occurring or engineered cells which express mutant or express 

30 both normal and mutant PKD1 gene products. Such cells 
include, but are not limited to renal epithelial cells, 
including primary and immortalized human renal tubular cells, 
MDCK cells, LLPCK1 cells, and human renal carcinoma cells. 
Cells, such as those described above, which exhibit 

35 ADPKD- like cellular phenotypes, maybe exposed to a compound 
suspected of exhibiting anti-ADPKD activity at a sufficient 
concentration and for a time sufficient to elicit such anti- 
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■ ADPKD1 activity in the exposed cells. After exposure, the 
cells are examined to determine whether one or more of the 
ADPKD-like cellular phenotypes has been altered to resemble a 
more wild type, non-ADPKD phenotype, 
5 Among the cellular phenotypes which may be followed in 

the above assays are differences in the apical/basolateral 
distribution of membrane proteins* For example, normal 
( i.e. , non-ADPKD) renal tubular cells in situ and in culture 
under defined conditions have a characteristic pattern of 

10 apical/basolateral distribution of cell surface markers. 
ADPKD- renal cells, by contrast, exhibit a distribution 
pattern that reflects a partially reversed apical/basolateral 
polarity relative to the normal distribution. For example, 
sodium-potassium ATPase is found on the basolateral membranes 

15 of renal epithelial cells but is found on the apical surface 
of ADPKD epithelial cells, both in cystic epithelia in vivo 
and in ADPKD cells in culture (Wilson, et al . , 1991, Am. J. 
Physiol, 260 :F420-F430) . Among the other markers which 
exhibit an alteration in polarity in normal versus ADPKD 

2 0 affected cells are the EGF receptor, which is normally 

located basolaterally , but in ADPKD cells is mislocated to 
the apical surface. Such a apical/basolateral marker 
distribution phenotype may be followed, for example, by 
standard immunohistology techniques using antibodies specific 
25 to the- marker (s) of interest in conjunction with procedures 
that are well known to those of skill in the art . 

Additionally , assays for the/ function of the PKD1 gene 
product can, for example, include a measure of extracellular 
matrix (ECM) components, such as proteoglycans, laminin, 

3 0 fibronectin and the like, in that studies in both ADPKD and 

in rat models of acquired cystic disease (Carone, F.A. et 
al., 1989, Kidney International 35:1034-1040) have shown 
alterations in such components. Thus, any compound which 
serves to create an extracellular matrix environment which 
35 more fully mimics the normal ECM should be considered as a 
candidate for testing for an ability to ameliorate ADPKD 
symptoms . 
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5.8.2 ANIMAL MODEL ASSAYS 
The ability of a compound, such as those identified in 
the foregoing binding assays, to prevent or inhibit disease 
may be assessed in animal models for ADPKD. Several 
5 naturally-occurring mutations for renal cystic disease have 
been found in animals. While these are not perfect models of 
ADPKD, they provide test systems for assaying the effects of 
compounds that interact with PKD1 proteins. Of these models, 
the Han:SPRD rat model is the only autosomal dominant 
10 example. Such a model is well known to those of skill in the 
art. See, for example, Kaspareit-Rittinghausen et al . , 1989, 
Vet. Path. 26.: 195. In addition, several recessive models 
exist (Reeders, S., 1992, Nature Genetics 1:235). 

Additionally, animal models exhibiting ADPKD- like 
15 symptoms may be engineered by utilizing PKD1 sequences such 
as those described, above, in Section 5.1, in conjunction 
with techniques for producing transgenic animals that are 
well known to those of skill in the art. 

Animals of any species, including, but not limited to, 
20 mice, rats, rabbits, guinea pigs, pigs, micro-pigs, goats, 
and non-human primates, e.g. , baboons, squirrels, monkeys, 
and chimpanzees may be used to generate such ADPKD animal 
models. 

In instances wherein the PKD1 mutation leading to ADPKD 
25 symptoms causes a drop in the level of PKD1 protein. or causes 
an ineffective PKD1 protein to be made ( i.e. , the PKD1 
mutation is a dominant loss -of -function mutation) various 
strategies may be utilized to generate animal models 
exhibiting ADPKD- like symptoms. For example, PKD1 knockout 
3 0 animals, such as mice, may be generated and used to screen 
for compounds which exhibit an ability to ameliorate ADPKD 
systems. Animals may be generated whose cells contain one 
inactivated copy of a PKDl-homologue . In such a strategy, 
human PKD1 gene sequences may be used to identify a PKD1 
35 homologue within the animal of interest, utilizing techniques 
described, above, in Section 5.1. Once such a PKD1 homologue 
has been identified, well-known techniques such as those 
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described, below, in Section 5.8.2.1. may be utilized to 
disrupt and inactivate the endogenous PKDl homolog, and 
further, to produce animals which are heterozygous for such 
an inactivated PKDl homolog. Such animals may then be 
5 observed for -the development of ADPKD-JLike symptoms. 

In instances wherein a PKDl mutation causes a PKDl 
protein having an aberrant PKDl activity which leads to ADPKD 
symptoms ( i.e. , the PKDl mutation is a dominant gain-of- 
function mutation) strategies such as those now described may 
10 be utilized to generate ADPKD animal models. First, for 
example, a human PKDl gene sequence containing such a gain- 
of- function PKDl mutation, and encoding such an aberrant PKDl 
protein, may be introduced into the genome of the animal of 
interest by utilizing well known techniques such as those 
15 described, below, in Section 5.8.2.1. Such a PKDl nucleic 
acid sequence must be controlled by a regulatory nucleic acid 
sequence which allows the mutant human PKDl sequence to be 
expressed in the cells, preferably kidney cells, of the 
animal of interest. The human PKDl regulatory 
20 promoter/enhancer sequences may be sufficient for such 

expression. Alternatively, the mutant PKDl gene sequences 
may be controlled by regulatory sequences endogenous to the 
animal of interest, or by any other regulatory sequences 
which are effective in bringing about the expression of the 
25 mutant^ human PKDl sequences in the animal cells of interest . 

Expression of the mutant human PKDl gene may be assayed, 
for example, by standard Northern analysis , and the 
production of the mutant human PKDl gene product may be 
assayed by, for example, detecting its presence by utilizing 
3 0 techniques whereby binding of an antibody directed against 
the mutant human PKDl gene product is detected. Those 
animals found to express the mutant human PKDl gene product 
may then be observed for the development of ADPKD- like 
symptoms . 

35 Alternatively, animal models of ADPKD may be produced by 

engineering animals containing mutations within one copy of 
their endogenous PKDl- homo logue which correspond to gain-of- 



WO 95/34573 



PO7US95/07079 



function mutations within the human PKD1 gene. Utilizing 
such a strategy, a PKD1 homologue may be identified and 
cloned from the animal of interest, using techniques such as 
those described, above, in Section 5.1* One or more gain-of- 
5 function mutations may be engineered into such a PKD1 homolog 
which correspond to gain-of -function mutations within the 
human PKD1 gene. By "corresponding", it is meant that the 
mutant gene product produced by such an engineered PKD1 
homologue will exhibit an aberrant PKD1 activity which is 
10 substantially similar to that exhibited by the mutant human 
PKD1 protein. 

The engineered PKD1 homologue may then be introduced 
into the genome of the animal of interest, using techniques 
such as those described, below, in Section 5.8.2.1. Because 

15 the mutation introduced into the engineered PKD1 homologue is 
expected to be a dominant gain-of -function mutation, 
integration into the genome need not be via homologous 
recombination, although such a route is preferred. 

Once transgenic animals have been generated, the 

20 expression of the mutant PKD1 homolog gene and protein may be 
assayed utilizing standard techniques, such as Northern ■ 
and/or Western analyses. Animals expressing mutant PKD1 
homolog proteins within the animals of interest, in cells or 
tissues, preferably kidney, of interest, the transgenic 

25 animals may be observed for the development of ADPKD- like 
symptoms . 

Any of the ADPKD animal models described herein may be 
used to test compounds for an ability to ameliorate ADPKD 
symptoms , 

30 In addition, as described in detail in Section 5.11 

infra, such animal models can be used to determine the LD 50 
and the ED 50 in animal subjects, and such data can be used to 
determine the in vivo efficacy of potential ADPKD treatments. 
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5.8.2.1 PRODUCTION OF PKD1 TRANSGENIC ANIMALS 
Any technique known in the art may be used to introduce 
a PKD1 gene into animals to produce the founder lines of 
transgenic animals. Such techniques include, but are not 
5 limited to pronuclear microinjection (ilqppe, P.C. and Wagner, 
T.E., 1989, U.S. Pat. No. 4,873,191); retrovirus mediated 
gene transfer into germ lines (Van der Putten et al . , 1985, 
Proc. Natl. Acad. Sci., USA 82: 6148 -6152 ) ; gene targeting in 
embryonic stem cells (Thompson et al , , 1989, Cell 56 : 313- 
10 321); electroporation of embryos (Lo, 1983, Mol Cell. Biol. 
3.: 1803^-1814) ; and sperm-mediated gene transfer (Lavitrano et 
al . , T989, Cell 57:717-723); etc. For a review of such 
techniques, see Gordon, 1989, Transgenic Animals, Intl. Rev. 
Cytol. 115 : 171-229 , which is incorporated by reference herein 
15 in its entirety) . 

When it is desired that the PKD1 transgene be integrated 
into the chromosomal site of the endogenous PKD1, gene 
targeting is preferred. Briefly, when such a technique is to 
be utilized, vectors containing some nucleotide sequences 
20 homologous to the endogenous PKD1 gene of interest are 
designed for the purpose of integrating, via homologous 
recombination with chromosomal sequences, into and disrupting 
the function of, the nucleotide sequence of the endogenous 
PKD1 gene, 

25 Once the PKD1 founder animals are produced, they may be 

bred, inbred, outbred, or crossbred to produce colonies of 
the particular animal. Examples of such breeding strategies 
include but are not limited to: outbreeding of founder 
animals with more than one integration site in order to 

30 establish separate lines; inbreeding of separate lines in 
order to produce compound PKD1 transgenics that express the 
PKD1 transgene at higher levels because of the effects of 
additive expression of each PKD1 transgene; crossing of 
heterozygous transgenic animals to produce animals homozygous 

35 for a given integration site in order to both augment 

expression and eliminate the possible need for screening of 
animals by DNA analysis; crossing of separate homozygous 
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lines to produce compound heterozygous or homozygous lines; 
breeding animals to different inbred genetic backgrounds so 
as to examine effects of modifying alleles on expression of 
the PKD1 transgene and the development of ADPKD-like 
5 symptoms . One such approach is to cross the PKD1 founder 
animals with a wild type strain to produce an Fl generation 
that exhibits ADPKD symptoms, such as the development of 
polycystic kidneys. The Fl generation may then be inbred in 
order to develop a homozygous line,, if it is found that 

10 homozygous PKD1 transgenic animals are viable. 

The present invention provides for transgenic animals 
that carry the transgene in all their cells, as well as 
animals which carry the transgene in some, but not all their 
. cells, i.e. , mosaic animals. The transgene may be integrated 

15 as a single transgene or in concatamers/ e.g. , head-to-head 
tandems or head-to-tail tandems. 

5.8.2,2. SELECTION AND CHARACTERIZATION 
OF THE PKD1 TRANSGENIC ANIMALS 

2Q The PKD1 transgenic animals that are produced in 

accordance with the procedures detailed, above, in Section 

5.8.2.1., should be screened and evaluated to select those 

animals which may be used as suitable animal models for 

ADPKD. 

25 Initial screening may be accomplished by Southern blot 

analysis or PCR techniques to analyze animal tissues to 
verify that integration of the transgene has taken place. 
The level of mRNA expression of the transgene in the tissues 
of the transgenic animals may also be assessed using 

3Q techniques which include, but are not limited to, Northern 
blot analysis of tissue samples obtained from the animal, in 
situ hybridization analysis, and reverse transcriptase -PCR 
(rt-PCR) . Samples of PKD1- expressing tissue, kidney tissue, 
for example, may be evaluated immunocytochemically using 

35 antibodies specific for the PKD1 transgene gene product. 

The PKD1 transgenic animals that express PKD1 mRNA or 
gene product (detected immunocytochemically, using antibodies 
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directed against PKD1 tag epitopes) at easily detectable 
levels should then be further evaluated histopathologically 
to identify those animals which display characteristic ADPKD- 
like symptoms- Such transgenic animals serve as suitable 
5 model systems for ADPKD . 

5.8,2.3. USES OF THE PKD1 ANIMAL MODELS 
The PKD1 animal models of the invention may be used as 
model systems for ADPKD disorder and/or to generate cell 
10 lines that can be used as cell culture models for this 
disorder. 

The PKD1 transgenic animal model systems for ADPKD may 
be used as a test substrate to identify drugs, 
pharmaceuticals, therapies and interventions which may be 

15 effective in treating such a disorder. Potential therapeutic 
agents may be tested by systemic or local administration. 
Suitable routes may include oral, rectal, or intestinal 
administration; parenteral delivery, including intramuscular, 
subcutaneous, intramedullary injections, as well as 

2 0 intrathecal, direct intraventricular, intravenous, 

intraperitoneal, intranasal, or intraocular injections, to 
name a few. The response of the animals to the treatment may 
be monitored by assessing the reversal of disorders 
associated with ADPKD. With regard to intervention, any 

25 treatments which reverse any aspect of ADPKD- like symptoms 
should., be considered as candidates for human ADPKD 
therapeutic intervention. However, treatments or regimens 
which reverse the constellation of pathologies associated 
with any of these disorders may be preferred. Dosages of 

30 test agents may be determined by deriving dose-response 
curves, as discussed in Section 5-11, below. 

In an alternate embodiment, the PKD1 transgenic animals 
of the invention may be used to derive a cell line which may 
be used as a test substrate in culture, to identify agents 

35 that ameliorate ADPKD- like symptoms. While primary cultures 
derived from the PKD1 transgenic animals of the invention may 
be utilized, the generation of continuous cell lines is 
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preferred. For examples of techniques which may be used to 
derive a continuous cell line from the transgenic animals, 
see Small et al . , 1985, Mol. Cell Biol. 5:642-648. 



5.9. COMPOUNDS THAT INHIBIT ^EXPRESSION, 
SYNTHESIS OR ACTIVITY OF MUTANT 
PKD1 ACTIVITY 



As discussed above, dominant mutations in the PKD1 gene 
that cause ADPKD may act as gain-of -function mutations which 
produce a form of the PKD1 protein which exhibits an aberrant 
activity that leads to the formation of ADPKD symptoms. A 
variety of techniques may be utilized to inhibit the" 
expression, synthesis, or activity of such mutant PKD1 genes 
and gene products ( i.e. , proteins) . 

For example, compounds such as those identified through 
assays described, above, in Section 5.4, which exhibit 
inhibitory activity, may be used in accordance with the 
invention to ameliorate ADPKD symptoms. Such molecules may 
include, but are not limited, to small and large organic 
molecules, peptides, and antibodies. Inhibitory antibody 
techniques are described, below, in Section 5.9.2. 

Further, antisense and ribozyme molecules which inhibit 
expression of the PKD1 gene, preferably • the mutant PKD1 gene, 
may also be used to inhibit the aberrant PKD1 activity. Such 
techniques are described, below, in Section 5,9.1. Still 
further, as described, below, in Section 5.9.1, triple helix 
molecules may be utilized in inhibiting the aberrant PKD1 
activity. / 

5.9.1. INHIBITORY ANTISENSE, RIBOZYME 
3 0 AND TRIPLE HELIX APPROACHES 

Among the compounds which may exhibit anti-ADPKD 
activity are antisense, ribozyme, and triple helix molecules. 
Such molecules may be designed to reduce or inhibit mutant 
PKD1 activity. Techniques for the production and use of such 

3 5 

molecules are well known to those of skill in the art. 
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Antisense RNA and DNA molecules act to directly block 
the translation of mRNA by binding to targeted mRNA and 
preventing protein translation ♦ With respect to antisense 
DNA, oligodeoxyribonucleotides derived from the translation 
5 initiation site, e.g. , between the <L0 and +10 regions of 
the PKD1 nucleotide sequence of interest, are preferred. 

Ribozymes are enzymatic RNA molecules capable of 
catalyzing the specific cleavage of RNA, The mechanism of 
ribozyme action involves sequence specific hybridization of 
10 the ribozyme molecule to complementary target RNA, followed 
by a.:endonucleolytic cleavage. The composition of ribozyme 
molecules must include one or more sequences complementary to 
the target PKD1 mRNA, preferably the mutant PKD1 mRNA, and 
must include the well known catalytic sequence responsible 
15 for mRNA cleavage. For this sequence, see U.S. Pat. No. 
5,093,246, which is incorporated by reference herein in its 
entirety. As such, within the scope of the invention are 
engineered hammerhead motif ribozyme molecules that 
specifically and efficiently catalyze endonucleolytic 
20 cleavage of RNA sequences encoding PKD1, preferably mutant 
PKD1 proteins. 

Specific ribozyme cleavage sites within any potential 
RNA target are initially identified by scanning the target 
molecule for ribozyme cleavage sites which include the 
25 following sequence: GUA, GUU and GUC. Once identified, 
short:. RNA sequences of between 15 and 20 ribonucleotides 
corresponding to the region of the target gene containing the 
cleavage site may be evaluated for predicted structural 
features, such as secondary structure, that may render the 
30 oligonucleotide sequence unsuitable. The suitability of 
candidate targets may also be evaluated by testing their 
accessibility to hybridization with complementary 
oligonucleotides, using ribonuclease protection assays. 
Nucleic acid molecules to be used in triplex helix 
35 formation should be single stranded and composed of 
deoxynucleotides . The base composition of these 
oligonucleotides must be designed to promote triple helix 
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formation via Hoogsteen base pairing rules, which generally 
require sizeable stretches of either purines or pyrimidines 
to be present on one strand of a duplex. Nucleotide 
sequences may be pyrimidine-based, which will result in TAT 
5 and CGC + triplets across the three associated strands of the 
resulting triple helix. The pyrimidine-rich molecules provide 
base complementarity to a purine -rich region of a single 
strand of the duplex in a parallel orientation to that 
strand- In addition, nucleic acid molecules may be chosen 
10 that are purine-rich, for example, contain a stretch of 

guanidine residues. These molecules will form a triple helix 
with a DNA duplex that is rich in GC pairs, in which the 
majority of the purine residues are located on a single 
strand of the targeted duplex, resulting in GGC triplets 
15 across the three strands in the triplex. 

Alternatively, the potential sequences that can be 
targeted for triple helix formation may be increased by 
creating a so called "switchback" nucleic acid molecule. 
Switchback molecules are synthesized in an alternating 5 '-3', 
20 3' -5' manner, such that they base pair with one strand of a 
duplex first and then the other, eliminating the necessity 
for a sizeable stretch of either purines or pyrimidines to be 
present on one strand of a duplex. 

It is possible that the antisense, ribozyme, and/or 
25 triple helix molecules described herein may reduce or inhibit 
the translation of mRNA produced by both normal and .mutant 
PKD1 alleles. In order to ensure, that substantial normal 
levels of PKD1 activity are maintained in the cell, nucleic 
acid, molecules that encode and express PKD1 polypeptides 
30 exhibiting normal PKD1 activity may be introduced into cells 
which do not contain sequences susceptible to whatever 
antisense, ribozyme, or triple helix treatments. Such 
sequences may be introduced via gene therapy methods such as 
those described, below, in Section 5.5, Alternatively, it 
35 may be preferable to coadminister normal PKD1 protein into 
the cell or tissue in order to maintain the requisite level 
of cellular or tissue PKD1 activity. 
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Antisense RNA and DNA molecules, ribozyme molecules and 
triple helix molecules of the invention may be prepared by 
any method known in the art for the synthesis of DNA and RNA 
molecules. These include techniques for chemically 
5 synthesizing oligodeoxyribonucleotides and 

oligoribonucleotides well known in the art such as for 
example solid phase phosphoramidite chemical synthesis. 
Alternatively, RNA molecules may be generated by in vitro and 
in vivo transcription of DNA sequences encoding the antisense 

10 RNA molecule. Such DNA sequences may be incorporated into a 
wide variety of vectors which incorporate suitable RNA 
polymerase promoters such as the T7 or SPG polymerase 
promoters. Alternatively, antisense cDNA constructs that 
synthesize antisense RNA constitutively or inducibly, 

15 depending on the promoter used, can be introduced stably into 
cell lines. 

Various well-known modifications to the DNA molecules 
may be introduced as a means of increasing intracellular 
stability and half -life. Possible modifications include, but 
2 0 are not limited to, the addition of flanking sequences of 
ribo- or deoxy- nucleotides to the 5' and/or 3' ends of the 
molecule or the use of phosphorothioate or 2' O-methyl rather 
than phosphodiesterase linkages within the 
oligodeoxyribonucleotide backbone. 

25 

5-9.2. ANTIBODIES THAT REACT WITH PKD1 GENE PRODUCT 
Antibodies that are both specific for mutant PKD1 gene 
product and interfere with its activity may be used. Such 
antibodies may be generated using standard techniques 
30 described in Section 5,3 . , supra, against the proteins 

themselves or against peptides corresponding to the binding 
domains of the proteins. Such antibodies include but are not 
limited to polyclonal, monoclonal, Fab fragments, F(ab') 2 
fragments, single chain antibodies, chimeric antibodies , 
35 humanized antibodies etc. 

The PKD1 protein appears to be an extracellular protein. 
Therefore, any of the administration techniques described, 
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below in Section 5,11 which are appropriate for peptide 
administration may be utilized to effectively administer 
inhibitory PKD1 antibodies to their site of action. 

5 5.10 METHODS FOR RESTORING PKD1 ACTIVI TY 

As discussed above, dominant mutations in the PKD1 gene 
that cause ADPKD may lower the level of expression of the 
PKD1 gene or, alternatively, may cause inactive or 
substantially inactive PKD1 proteins to be formed. In either 

10 instance, the result is an overall lower level of normal PKD1 
activity in the tissues or cells in which PKD1 is normally 
expressed. This lower level of PKD1 activity, then,- leads to 
ADPKD symptoms. Thus, such PKD1 -mutations represent dominant 
loss -of- function mutations. Described in this Section are 

15 methods whereby the level of normal PKD1 activity may be 
increased to levels wherein ADPKD symptoms .are ameliorated. 

For example, normal PKD1 protein, at a level sufficient 
to ameliorate ADPKD symptoms may be administered to a patient 
exhibiting such symptoms. Any of the techniques discussed, 

2 0 below, in Section 5,11, may be utilized for such 

administration. One of skill in the art will readily know 
how to determine the concentration of effective, non- toxic 
doses of the normal PKD1 protein, utilizing techniques such 
as those described, below, in Section 5.11. 
25 Additionally, DNA sequences encoding normal PKD1 protein 

may be directly administered to a patient exhibiting ADPKD 
symptoms, at a concentration sufficient to produce a level of 
PKD1 protein such that ADPKD symptoms are ameliorated. Any 
of the techniques discussed, below, in Section 5.11, which 

3 0 achieve intracellular administration of compounds, such as, 

for example, liposome administration, may be utilized for the 
administration of such DNA molecules. The DNA molecules may 
be produced, for example, by recombinant techniques such as 
those described, above, in Section 5.1, and its subsections. 
3 5 Further, patients with these types of mutations may be 

treated by gene replacement therapy. A copy of the normal 
PKD1 gene or a part of the gene that directs the production 
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of a normal PKD1 protein with the function of the PKDl 
protein may be inserted into cells, renal cells, for example, 
using viral or non- viral vectors which include, but are not 
limited to vectors derived from, for example, retroviruses, 
5 vaccinia virus, adeno-associated virua, herpes viruses, 

bovine papilloma virus or additional, non- viral vectors, such 
as plasmids. In addition, techniques frequently employed by 
those skilled in the art for introducing DNA into mammalian 
cells may be utilized. For example,, methods including but 
10 not limited to electroporation, DEAE-dextran mediated DNA 

transfer, DNA guns, liposomes, direct injection, and the like 
may be utilized to transfer recombinant vectors into host 
cells:' 1 Alternatively, the DNA may be transferred into cells 
through conjugation to proteins that are normally targeted to 
15 the inside of a cell. For example, the DNA may be conjugated 
to viral proteins that normally target viral particles into 
the targeted host cell. Additionally, techniques such as 
those described in Sections 5.1 and 5.2 and their 
subsections, above, may be utilized for the introduction of 
20 normal PKDl gene sequences into human cells. 

The PKDl gene is very large and, further, encodes a very 
large, approximately 14 kb, transcript. Additionally, the 
PKDl gene product is large, having 4304 amino acids, with a 
molecular weight of about 467 kD. It is possible, therefore, 
25 that t5he introduction of the entire PKDl coding region may be 
cumbersome and potentially inefficient as a gene therapy 
approach. However, because the entire PKDl gene product may 
not be necessary to avoid the appearance of ADPKD symptoms, 
the use of a "minigene" therapy approach {see, e.g. , Ragot, 
30 T. et al., 1993, Nature 361:647; Dunckley, M.G. et al . , 1993, 
Hum. Mol. Genet. 2:717-723) can serve to ameliorate such 
ADPKD symptoms. 

Such a minigene system comprises the use of a portion of 
the PKDl coding region which encodes a partial, yet active or 
35 substantially active PKDl gene product, As used herein, 

"substantially active" signifies that the gene product * serves 
to ameliorate ADPKD symptoms. Thus, the minigene system 
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utilizes only that portion of the normal PKD1 gene which 
encodes a portion of the PKD1 gene product capable of 
ameliorating ADPKD symptoms, and may, therefore represent an 
effective and even more efficient ADPKD gene therapy than 
5 full-length gene therapy approaches, gSuch a minigene can be 
inserted into cells and utilized via the procedures . 
described, above, for full-length gene replacement. The 
cells into which the PKD1 minigene are to be introduced are, 
preferably, those cells, such as renal cells, which are 

10 affected by ADPKD. Alternatively, any suitable cell can be 
transfected with a PKD1 minigene as long as the minigene is 
expressed in a sustained, stable fashion and produces a gene 
product that ameliorates ADPKD symptoms. Regulatory 
sequences by which such a PKD1 minigene can be successfully 

15 expressed will vary depending upon the cell into which the 
minigene is introduced. The skilled artisan will be aware of 
appropriate regulatory sequences for the given cell to be 
used. Techniques for such introduction and sustained 
expression are routine and are well known to those of skill 

20 in the art. 

A therapeutic minigene for the amelioration of ADPKD 
symptoms can comprise a nucleotide sequence which encodes at 
least one PKD1 gene product peptide domain, as shown in FIGS. 
7 and 8. For example, such PKD1 peptide domains (the 

25 approximate amino acid residue positions of which are listed 
in parentheses after each domain name) can include a- leucine - 
rich repeat domain (72 to 94, or $1 to 119) and/or a 
cysteine-rich repeat domain (32 to 65) , a C-type (calcium 
dependent) lectin protein domain (4 05 to 534) , an LDL-A 

30 module (641 to 671), one or more PKD domains (282 to 353; 

1032 to 1124; 1138 to 1209; 1221 to 1292; 1305 to 1377; 1390 
to 1463; 1477 to 1545; 1559 to 1629; 1643 to 1715; 1729 to 
1799; 1815 to 1884; 1898 to 1968; 1983 to 2058; 2071 to 
2142) , or at least one C-terminal domain (2160 to 4304) 

35 ( i- 5 -.y/ a peptide domain found in the C-terminal half of the 
PKD1 gene product) . Minigenes which encode such PKDl gene 
products can be synthesized and/or engineered using the PKDl 
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gene sequence (SEQ ID NO:l) disclosed herein, and by 
utilizing the amino acid residue domain designations found in 
FIGS. 7 and 8. 

Among the ways whereby the PKDl minigene product 
5 activity can be assayed involves the Bse of PKDl knockout 
animal models. Such animal models express an insufficient 
level of the PKDl gene product. The production of such 
animal models may be as described above, in Section 5-8.2, 
and involves methods well known to those of skill in the art. 

10 PKDl minigenes can be introduced into the PKDl knockout 
animal models as, for example, described above, in this 
Section. The activity of the minigene can then be assessed 
by assaying for the amelioration of ADKPD-like symptoms. 
Thus, the relative importance of each of the PKD peptide 

15 domains, individually and/or in combination, with respect to 
PKDl gene activity can be determined. 

Cells, preferably, autologous cells, containing normal 
PKDl expressing gene sequences may then be introduced or 
reintroduced into the patient at positions which allow for 

20 the amelioration of ADPKD symptoms. Such cell replacement 
techniques may be preferred, for example, when the PKDl gene 
product is a secreted, extracellular gene product. 

5.11. PHARMACEUTICAL PREPARATIONS 

_ AND METHODS OF ADMINISTRATION 

25 

The identified compounds that inhibit PKDl expression, 
synthesis and/or activity can be /administered to a patient at 
therapeutically effective doses to treat polycystic kidney 
disease. A therapeutically effective dose refers to that 
3Q amount of the compound sufficient to result in amelioration 
of symptoms of polycystic kidney disease. 

5.11.1. EFFECTIVE DOSE 
Toxicity and therapeutic efficacy of such compounds can 
35 be determined by standard pharmaceutical procedures in cell 
cultures or experimental animals, e.g. , for determining the 
LD 50 (the dose lethal to 50% of the population) and the ED 50 
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(the dose therapeutically effective in 50% of the 
population) . The dose ratio between toxic and therapeutic 
effects is the therapeutic index and it can be expressed as 
the ratio LD S0 /ED 50 . Compounds which exhibit large therapeutic 
5 indices are preferred. While compounds ■ that exhibit toxic 
side effects may be used, care should be taken to design a 
delivery system that targets such compounds to the site of 
affected tissue in order to minimize potential damage to 
uninfected cells and, thereby, reduce side effects. 

10 The data obtained from the cell culture assays and 

animal studies can be used in formulating a range of dosage 
for use in humans . The dosage of such compounds lies 
preferably within a range of circulating concentrations that 
include the ED S0 with little or no toxicity. The dosage may 

15 vary within this range depending upon the dosage form 

employed and the route of administration utilized. For any 
compound used in the method of the invention, the 
therapeutically effective dose can be estimated initially 
from cell culture assays. A dose may be formulated in animal 

20 models to achieve a circulating plasma concentration range- 
that includes the IC S0 ( i .e, , the concentration of the test 
compound which achieves a half-maximal inhibition of 
symptoms) as determined in cell culture. Such information 
can be used to more accurately determine useful doses in 

25 humans. Levels in plasma may be measured, for example, by 
high performance liquid chromatography. Additional factors 
which may be utilized to optimize dosage can include, for 
example, such factors as the severity of the ADPKD symptoms 
as well as the age, weight and possible additional disorders 

30 which the patient may also exhibit. Those skilled in the art 
will be able to determine the appropriate dose based on the 
above factors . 

5.11.2. FORMULATIONS AND USE 
35 Pharmaceutical compositions for use in accordance with 

the present invention may be formulated in conventional 



- 55 - 



WO 95/34573 



PCT/US95/07079 



manner using one or more physiologically acceptable carriers 
or excipients. 

Thus, the compounds and their physiologically acceptable 
salts and solvates may be formulated for administration by 
5 inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions 
may take the form of, for example, tablets or capsules 
prepared by conventional means with pharmaceutically 

10 acceptable excipients such as binding agents ( e . g . , 
pregelatinised maize starch, polyvinylpyrrolidone or 
hydroxypropyl methylcellulose) ; fillers ( e.g. , lactose, 
microcrystalline cellulose or calcium hydrogen phosphate) ; 
lubricants ( e.g. , magnesium stearate, talc or silica); 

15 disintegxants ( e ,g « , potato starch or sodium starch 
glycollate) ; or wetting agents ( e.g. , sodium lauryl 
sulphate) . The tablets may be coated by . methods well known 
in the art. Liquid preparations for oral administration may 
take the form of, for example, solutions, syrups or 

20 suspensions, or they may be presented as a dry product for 
constitution with water or other suitable vehicle before use. 
Such liquid preparations may be prepared by conventional 
means with pharmaceutically acceptable additives such as 
suspending agents ( e.g. , sorbitol syrup, cellulose 

25 derivatives or hydrogenated edible fats) ; emulsifying agents 
( e.g. , -lecithin or acacia) ; non-aqueous vehicles ( e.g. , 
almond oil, oily esters, ethyl alcohol or fractionated 
vegetable oils) ; and preservatives ( e.g. , methyl or propyl -p- 
hydrdxybenzoates or sorbic acid) . The preparations may also 

30. contain buffer salts, flavoring, coloring and sweetening 
agents as appropriate . 

Preparations for oral administration may be suitably 
formulated to give controlled release of the active compound. 
For buccal administration the compositions may take the 

35 form of tablets or lozenges formulated in conventional 
manner. 
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For administration by inhalation, the compounds for use 
according to the present invention are conveniently delivered 
in the form of an aerosol spray presentation from pressurized 
packs or a nebuliser, with the use of a suitable propellant, 
5 e.g. , dichlorodif luoromethane, trichlorof luoromethane, 

dichlorotetraf luoroethane, carbon dioxide or other suitable 
gas. In the case of a pressurized .aerosol the dosage unit 
may be determined by providing a valve to deliver a metered 
amount* Capsules and cartridges of e.g. , gelatin, for use in 

10 an inhaler or insufflator may be formulated containing a 

powder mix of the compound and a suitable powder base such as 
lactose or starch. 

The compounds may be formulated for parenteral 
administration by injection, e.g. , by bolus injection or 

15 continuous infusion. Formulations for injection may be 

presented in unit dosage form, e.g. , in ampoules or in multi- 
dose containers, with an added preservative. The 
compositions may take such forms as suspensions, solutions or 
emulsions in oily or aqueous vehicles, and may contain 

20 formulatory agents such as suspending, stabilizing and/or 
dispersing agents . Alternatively, the active ingredient may 
be in powder form for constitution with a suitable vehicle, 
e.g. , sterile pyrogen- free water, before use. 

The compounds may also be formulated in rectal 

25 compositions such as suppositories or retention enemas, e.g. , 
containing conventional suppository bases such as cocoa 
butter or other glycerides. / 

In addition to the formulations described previously, 
the compounds may also be formulated as a depot preparation. 

30 Such long acting f ormulations may be administered by 

implantation (for example subcutaneously or intramuscularly) 
or by intramuscular injection. Thus, for example, the 
compounds may be formulated with suitable polymeric or 
hydrophobic materials (for example as an emulsion in an 

35 acceptable oil) or ion exchange resins, or as sparingly 
soluble derivatives, for example, as a sparingly soluble 
salt . 
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The compositions may, if desired, be presented in a pack 
or dispenser device which may contain one or more unit dosage 
forms containing the active ingredient. The pack may for 
example comprise metal or plastic foil, .such as a blister 
5 pack. The pack or dispenser device may be accompanied by 
instructions for administration. 

5.12. DIAGNOSIS OF PKD1 ABNORMALITIES 
A variety of methods may be employed, utilizing reagents 

10 such as PKD1 nucleotide sequences described in Sections 5.1, 
and antibodies directed against PKD1 gene product or 
peptides, as described, above, in Section 5.1.3. 
Specifically, such reagents may be used for the detection of 
the presence of PKD1 mutations, i.e. , molecules present in 

15 diseased tissue but absent from, or present in greatly 

reduced levels relative to, the corresponding non-diseased 
tissue. 

The methods described herein may be performed, for 
example, by utilizing pre-packaged diagnostic kits comprising 
20 at least one specific PKD1 nucleic acid or anti-PKDl antibody 
reagent described herein, which may be conveniently used; 
e.g. , in clinical settings, to diagnose patients exhibiting 
PKD1 abnormalities. 

Any tissue in which the PKD1 gene is expressed may be 
25 utilized in the diagnostics described below. 

57 12 . 1 DETECTION OF PKD-1 NUCLEIC ACIDS 

RNA from the tissue to be analyzed may be isolated 
using procedures which are well known to those in the art. 

3 0 Diagnostic procedures may also be performed in situ directly 
upon tissue sections (fixed and/or frozen) of patient tissue 
obtained from biopsies or resections, such that no RNA 
purification is necesisary. Nucleic acid reagents such as 
those described in Section 5.1, and its subsections, may be 

35 used as probes and/or primers for such in situ procedures 

(Nuovo, G.J., 1992, PCR in situ hybridization: protocols and 
applications, Raven Press, NY) . 
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PKD1 nucleotide sequences, either RNA or DNA, may, for 
example, be used in hybridization or amplification assays of 
biological samples to detect abnormalities of PKD1 
expression; e.g. , Southern or Northern analysis, single 
5 stranded conformational polymorphism ^SSCP) analysis 
including in situ hybridization assays, alternatively, 
polymerase chain reaction analyses . Such analyses may reveal 
both quantitative abnormalities in the expression pattern of 
the PKD1 gene, and, if the PKD1 mutation is, for example, an 

10 extensive deletion, or the result of a chromosomal 

rearrangement, may reveal more qualitative aspects of the 
PKD1 abnormality. 

Preferred diagnostic methods for the detection of PKD1 
specific nucleic acid molecules may involve for example, 

15 contacting and incubating nucleic acids, derived from the 
target tissue being analyzed, with one or more labeled 
nucleic acid reagents as are described in Section 5.1, under 
conditions favorable for the specific annealing of these 
reagents to their complementary sequences within the target 

20 molecule. Preferably, the lengths of these nucleic acid 

reagents are at least 15 to 30 nucleotides. After incubation, 
all non-annealed nucleic acids are removed. The presence of 
nucleic acids from the target tissue which have hybridized, 
if any such molecules exist, is then detected. Using such a 

25 detection scheme, the target tissue nucleic acid may be 
immobilized, for example, to a solid support such as a 
membrane, or a plastic surface such as that on a microtiter 
plate or polystyrene beads. In this case, after incubation, 
non-annealed, labeled nucleic acid reagents of the type 

30 described in Section 5.1 and its subsections are easily 
removed. Detection of the remaining, annealed, labeled 
nucleic acid reagents is accomplished using standard 
techniques well-known to those in the art. 

Alternative diagnostic methods for the. detection of PKD1 

35 specific nucleic acid molecules may involve their 

amplification, e.g. , by PCR (the experimental embodiment set 
forth in Mullis, K.B., 1987, U.S. Patent No. 4,683,202), 
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ligase chain reaction (Barany, F., 1991, Proc. Natl- Acad. 
Sci. USA 88.: 189-193) , self sustained sequence replication 
(Guatelli, J.C, et al . , 1990, Proc. Natl. Acad. Sci. USA 
87.: 1874 -1878) , transcriptional amplification system (Kwoh, 
5 D.Y et al. , 1989, Proc. Natl. Acad. Sci. USA 86:1173-1177) , 
Q-Beta Replicase (Lizardi, P.M. et al . , 1988, Bio/Technology 
6:1197), or any other RNA amplification method, followed by 
the detection of the amplified molecules using techniques 
well known to those of skill in the art. These detection 

10 schemes are especially useful for the detection of RNA 

molecules if such molecules are present in very low numbers. 

In one embodiment of such a detection scheme, a cDNA 
molecule is obtained from the target RNA molecule ( e.g. , by 
reverse transcription of the RNA molecule into cDNA) . 

15 Tissues from which such RNA may be isolated include any 
tissue in which wild type PKD1 is known to be expressed, 
including, but not limited, to kidney tissue and lymphocyte 
tissue. A target sequence within the cDNA is then used as 
the template for a nucleic acid amplification reaction, such 

20 as a PGR amplification reaction, or the like. The nucleic . 
acid reagents used as synthesis initiation reagents ( e.g. , 
primers) in the reverse transcription and nucleic acid 
amplification steps of this method are chosen from among the 
PKD1 nucleic acid reagents described in Section 5.1 and its 

25 subsections. The preferred lengths of such nucleic acid 
reagents are at least 15-30 nucleotides. For detection of 
the amplified product, the nucleic acid amplification may be 
performed using radioactively or non-radioactively labeled 
nucleotides. Alternatively, enough amplified product may be 

30 made such that the product may be visualized by standard 

ethidium bromide staining or by utilizing any other suitable 
nucleic acid staining method. 

5.12.2. DETECTION OF PKD1 GENE PRODUCT AND PEPTIDES 

35 

Antibodies directed against wild type or mutant PKD1 
gene product or peptides, which are discussed, above, in 
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Section 5*3, may also be used as ADPKD diagnostics, as 
described, for example, herein. Such diagnostic method, may 
be used to detect abnormalities in the level of PKD1 protein 
expression, or abnormalities in the location of the PKD1 
5 tissue, cellular, or subcellular location of PKD1 protein. 
For example, in addition, differences in the size, 
electronegativity, or antigenicity of the mutant PKD1 protein 
relative to the normal PKD1 protein may also be detected. 
Protein from the tissue to be analyzed may easily be 
10 isolated using techniques which are well known to those of 
skill in the art. The protein isolation methods employed 
herein may, for example, be such as those, described in Harlow 
and Lane (Harlow, E. and Lane, D., 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Laboratory Press, Cold 
15 Spring Harbor, New York) , which is incorporated herein by 
reference in its entirety. 

Preferred diagnostic methods for the detection of wild 
type or mutant PKD1 gene product or peptide molecules may 
involve, for example, immunoassays wherein PKDl peptides are 
20 detected by their interaction with an anti-PKDl specific 
peptide antibody. 

For example, antibodies, or fragments of antibodies, 
such as those described, above, in Section 5,3, useful in the 
present invention may be used to quantitatively or 
25 qualitatively detect the presence of wild type or mutant PKDl 
peptides. This can be accomplished, for example, by 
immunofluorescence techniques employing a f luorescently 
labeled antibody (see below) coupled with light microscopic, 
flow cytometric, or fluorimetric detection. Such techniques 
3 0 are especially preferred if PKDl gene products or peptides 
are expressed on the cell surface. 

The antibodies (or fragments thereof) useful in the 
present invention may, additionally, be 'employed 
histologically, as in immunofluorescence or immunoelectron 
35 microscopy, for in situ detection of PKDl gene product or 
peptides. In situ detection may be accomplished by removing 
a histological specimen from a patient, and applying thereto 
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a labeled antibody of the present invention. The 
histological sample may be taken from a "tissue suspected of 
exhibiting ADPKD. The antibody (or fragment) is preferably 
applied by overlaying the labeled antibody {or fragment) onto 
5 a biological sample. Through the use ^of such a procedure, it 
is possible to determine not only the presence of the PKD1 
peptides, but also their distribution in the examined tissue. 
Using the present invention, those of ordinary skill will 
readily perceive that any of a wide variety of histological 

10 methods (such as staining procedures) can be modified in 
order-to. achieve such in situ detection. 

Immunoassays for wild type or mutant PKD1 gene product 
or peptides typically comprise incubating a biological 
sample, such as a biological fluid, a tissue extract., freshly 

15 harvested cells, or cells which have been incubated in tissue 
culture, in the presence of a detectably labeled antibody 
capable of identifying PKD1 peptides, and detecting the bound 
antibody by. any of a number of techniques well-known in the 
art . 

20 The biological sample may be brought in contact with and 

immobilized onto a solid phase support or carrier such as 
nitrocellulose, or other solid support which is capable of 
immobilizing cells, cell particles or soluble proteins. The 
support may then be washed with suitable buffers followed by 

25 treatment with the detectably labeled PKD1 specific antibody. 
The solid phase support may then be washed with the buffer a 
second; time to remove unbound antibody. The amount of bound 
label on solid support may then be detected by conventional 
means . 

3 0 By "solid phase support or carrier 11 is intended any 

support capable of binding an antigen or an antibody. Well- 
known supports or carriers include glass, polystyrene, 
polypropylene, polyethylene, dextran, nylon, amylases, 
natural and modified celluloses, polyacrylamides , gabbros, 

35 and magnetite. The nature of the carrier can be either 

soluble to some extent or insoluble for the purposes of the 
present invention. The support material may have virtually 
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any possible structural configuration so long as the coupled 
molecule is capable of binding to an antigen or antibody. 
Thus, the support configuration may be spherical, as in a 
bead, or cylindrical, as in the inside surface of a test 
5 tube, or the external surface of a rocl- Alternatively, the 
surface may be flat such as a sheet, test strip, etc. 
Preferred supports include polystyrene beads. Those skilled 
in the art will know many other suitable carriers for binding 
antibody or antigen, or will be able to ascertain the same by 

10 use of routine experimentation. 

The binding activity of a given lot of anti-wild type or 
mutant PKD1 peptide antibody may be determined according to 
well known methods. Those skilled in the art will'be able to 
determine operative and optimal assay conditions for each 

15 determination by employing routine experimentation. 

One of the ways in which the PKD1 peptide-specif ic 
antibody can be detectably labeled is by linking the same to 
an enzyme and use in an enzyme immunoassay (EIA) (Voller, A,, 
"The Enzyme Linked Immunosorbent Assay (ELISA) " , Diagnostic 

20 Horizons 2:1-7, 1978) (Microbiological Associates Quarterly 
Publication, Walkersville, MD) ; Voller, A. et al . , J". Clin. 
Pathol. 31:507-520 (1978); Butler, J.E., Meth. Enzymol . 
73:482-523 (1981); Maggio, E. (ed.), ENZYME IMMUNOASSAY, CRC 
Press, Boca Raton, FL, 1980; Ishikawa, E. et al . , (eds.) 

25 ENZYME IMMUNOASSAY, Kgaku Shoin, Tokyo, 1981) . The enzyme 
which is bound to the antibody will react with an appropriate 
substrate, preferably a chromoger^ic substrate, in such a 
manner as to produce a chemical moiety which can be detected, 
for .example, by spectrophotometry, fluorimetric or by visual 

30 means. Enzymes which can be used to detectably label the 
antibody include, but are not limited to, malate 
dehydrogenase , staphylococcal nuclease, del ta-5 -steroid 
isomerase , yeast alcohol dehydrogenase , alpha - 
glycerophosphate , dehydrogenase , triose phosphate isomerase , 

35 horseradish peroxidase, alkaline phosphatase, asparaginase, 
glucose oxidase , beta-galactosidase , ribonuclease , urease , 
catalase, glucose-6 -phosphate dehydrogenase, glucoamylase and 
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acetylcholinesterase. The detection can be accomplished by 
colorimetric methods which employ a chromogenic substrate for 
the enzyme. Detection may also be accomplished by visual 
comparison of the extent of enzymatic reaction of a substrate 
5 in comparison with similarly prepared ^standards . 

Detection may be accomplished using any of a variety of 
other immunoassays* For example, by radioactively labeling 
the antibodies or antibody fragments it is possible to detect 
PKD1 wild type or mutant peptides through the use of a 

10 radioimmunoassay (RIA) (see, for example, Weintraub, B., 

Principles of Radioimmunoassays, Seventh Training Course on 
Radioligand Assay Techniques, The Endocrine Society, March, 
1986 r ^which is incorporated by reference herein) . The 
radioactive isotope can be detected by such means as the use 

15 of a gamma counter or a scintillation counter or by 
autoradiography* 

It is also possible to label the antibody with a 
fluorescent compound. When the fluorescent ly labeled 
antibody is exposed to light of the proper wave length, its 

20 presence can then be detected due to fluorescence. Among the 
most commonly used fluorescent labeling ' compounds are 
fluorescein isothiocyanate, rhodamine, phycoerythrin, 
phycocyanin, allophycocyanin, o-phthaldehyde and 
f luorescamine . 

25 The antibody can also be detectably labeled using 

fluorescence emitting metals such as 152 Eu, or others of the 
lanthanide series. These metals can be attached to the 
antibody using such metal chelating groups as 
diethylenetriaminepentacetic acid (DTPA) or 

30 ethylenediaminetetraacetic acid (EDTA) . 

The antibody also can be detectably labeled by coupling 
it to a chemiluminescent compound. The presence of the 
chemiluminescent -tagged antibody is then determined by 
detecting the presence of luminescence that arises during the 

35 course of a chemical reaction. Examples of particularly 
useful chemiluminescent labeling compounds are luminol, 
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isoluminol, theromatic acridinium ester, imidazole, 
acridinium salt and oxalate ester. 

Likewise, a bioluminescent compound may be used to 
label the antibody of the present invention. Bioluminescence 
5 is a type of chemiluminescence found in biological systems 
in, which a catalytic protein increases the efficiency of the 
chemiluminescent reaction. The presence of a bioluminescent 
protein is determined by detecting the presence of 
luminescence. Important bioluminescent compounds for 
10 purposes of labeling are luciferin, lucif erase and aequorin. 

6. EXAMPLE: DETERMINATION OF THE PKD1 INTERVAL 

VIA GENETIC POLYMORPHISM ANALYSIS 

In the Working Example presented herein, genetic linkage 
15 studies are discussed which successfully reduced the 
potential PKD1 interval from approximately 750 kb to 
approximately 460 kb, thus substantially narrowing the 
genomic region in which the gene responsible for ADPKD lies. 

20 6.1 MATERIALS AND METHODS 

Sequencing techniques : Sequencing of cDNA clones and 
genomic clones was carried out using an Applied Biosystems 
ABI 373 automated sequencing machine according to the 
manufacturer's recommendations or by manual sequencing 

25 according to the method of Ausubel P. M. et al . , eds . f 1989, 
Current Protocols in Molecular Biology, Vol. I, Green 
Publishing Associates, Inc., and/John Wiley & "Sons'," New York, 
pp. 7.0.1 & f f • 

* Inserts from the cDNA phage clones were excised with 

3.0 EcoRI and ligated into the appropriate cloning sites in the 
polylinker of pBlueScript plasmid (Stratagene) . Primers for 
sequencing of the plasmid clones were based on the known 
sequence of the polylinker. A second set of sequencing 
primers were based on the DNA sequences obtained from the 

35 first sequencing reactions. Sequences obtained using the 
second set of primers were used to design a third set of 
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primers and so on. Both strands of the double-stranded 
plasniids were sequenced* 

PGR products were sequenced using the dsDNA cycle 
sequencing system of GIBCO-BRL (Gaithersburg, MD) according 
5 to the manufacturer's instructions. E£R product was 

purified, prior to sequencing, by passing the DNA through a 
Centricon column twice according to the manufacturer's 
instructions (Amicon, Beverly, MA, USA) . 100-200ng of each 
purified PGR product was used as template in the sequencing 
10 reaction. 

Genomic sequences were obtained from PCR products as 
well as from subclones from the cosmids . To ensure the 
correct locus sequence was obtained over the duplicated 
locus. Only cGGGlO and cDEBll sequence was utilized- when 
15 identifiying intron/exon boundaries. 

DNA labelling : Double -stranded DNA probes were made by 
labelling DNA by the method of Feinberg and Vogelstein, 1983, 
Anal. Biochem- 132: 6-13. Primers were end-labelled with 
20 Y 32 p~ATP using the method of Ausubel F. M . et al . , eds . , 1989, 
Current Protocols in Molecular Biology, Vol- 1, Green 
Publishing Associates, Inc., and John Wiley & Sons, New York, 
pp. 4.8.2 tff. 

25 PCR conditions : Conditions for the PCR reactions were 
determined empirically for each reaction by analyzing an 
array of reaction conditions with , the following variables: 
magnesium concentrations of ImM, 2mM, 4mM; annealing 
temperature; extension time; primer concentration and primer 

30 concentration ratio. 

The fixed conditions were: 

1, extension at 72°C using Taq polymerase, 2.5u/100/zl 
reaction volume; 

2. denaturation at 95 °C for 1 minute; and 
35 3. annealing for 30 seconds. 1 
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Primer design : Primers were designed using the computer 
program " PR IMER " . 

Genetic linkage studies : Genetic linkage studies were 
5 carried out using computerized algorithms (Lathrop GM. , et 
al., 1984, Proc. Natl. Acad. Sci. USA, 81 : 3443 -3446 ; Lathrop 
GM and Lalouel J-M., 1984, Am. J. . Hum. Genet. 36-460-465; 
Lathrop G.M., Lalouel J.-M., Julier C, Ott J., 1985, Am. J . 
Hum. Genet. 37:482-498). 

10 

Single-stranded conformational polvmorhism analysis " (SSCP) : 

SSCP analysis to detect sequence polymorphisms was 
carried out according to the method of Orita et al, 198 9, 
Genomics, 5.: 874-879. Primers were designed to amplify each 
15 exon (see figure 10 and Table 1, below) . The 3' end of each 
primer was designed to lie ~20-50bp from the nearest 
intron/exon boundary so that mutations in the splice donor 
and acceptor sites could be detected. 

20 Table 1: Primer Sequences from the PKD1 gene 



Primer Name 


Sequence (5' -3') 


Sense/ant isense 


KG8-F9 


CTGCCGGCCTGGTGTCG 


sense 


KG8-F11 


AGGGTCCACACGGGCTCGG 


sense 


KG8-F23 


CAGGGTGTCCGTGCGTGACTG 


sense 


KG8-F25 


GTCCAGCACTCCTGGGGAGA 


sense 


KG8-F26 


ACGCAAGGACAAGGGAGTAG 


sense 


KG8-F27 


AGTGCCGCGGCCTCCTGAC 


sense 


KG8-F28 


GCTGGCCTAGGCGGCTTCCA 


sense 


KG8-MF2 


CACCCCACGGCTTTGCACT % 


sense 


KG8-MF4 


CCCAGGCAG CGAGGCTGTC 


sense 
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KG8-R02 


ACACCAGGCCAACAGCGACTG 


antisense 




KG8-R9 


ACAGCCACCAGGAGCAGGCTG 
A 


antisense 


5 


trr* o DTI 








VT 1 Q DOO 




Hiiux sense 




JvGo 


GGAGGUUAuAGL? 1 G/iGGL- 1 


ant iscnse 


10 


KG 8 - R2 7 


CGGAGGAG1 GAGG1 GGGCTCC 


antisense 




T/n o T)i o 


AGCCA1TGIGAGGACI L.1 CCC 


antisense 




JNi\(jy - r Z 


AAG AL, L I bA JL L- U ALr L. AvjG JI LL 


sense 


15 




LAGL-AL.ial L/\I LL1 uAGLj 


sense 


NKG9-R03 


CTCCCAGCCACCTTGCTC 


antisense 




NKG9-R07 


GCAGCTGTCGATGTCCAG 


antisense 




NKG9-RM2 


TCTGTCCAACAAAGGCCTG 


antisense 



20 

6.2 RESULTS 



It was previously shown that the PKD1 gene maps, by 
genetic linkage, to the interval between the polymorphic 
genetic markers D16S259 (which lies on the telomeric side of 
PKD1) and D16S25 (which lies on the centromeric side of PKD1) 
{see Somlo et al., 1992, Genomics 13.: 152). The smallest 
interval between genetic markers, called the PKD1 interval 
was found to be approximately 750kb (see Germino et al . , 
1992, Genomics 13:144). The PKD1 interval was isolated as a 
series of forty overlapping cosmid and phage clones. The 
cloned DNA contained the entire PKD1 interval with the 
exception of two gaps of less than lOkb and less than 50kb 
(see FIG. 1; Germino et al., Genomics 13:144, 1992). 

In the Example presented herein, in order to reduce the 
PKD1 interval still further, a systematic search for 
additional polymorphic markers was undertaken. Single- 
stranded DNA probes (CA) 8 _ 1S were hybridized to the set of 
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clones from the PKD1 interval. The phage clone w5.2 (see 
FIG. 1) was found to hybridize to the probe and the sequence 
flanking the (CA)n (w5.2 repeat) was determined using phage 
DNA as a template. Primers for the polymerase chain reaction 
S (PCR) were designed and used to detecfe polymorphism within 
the w5*2Ca repeat. The position of the w5 . 2Ca repeat is 
shown in FIG. 2. This w5.2Ca repeat was used in genetic 
linkage studies in 15 PKD1 families and found to lie proximal 
to the PKD1 locus. This experiment reduced the size of the 
10 PKD1 interval to approximately 460kb, as shown in FIG. 2. 

7. EXAMPLE: IDENTIFICATION OF POTENTIAL PKD'l 

TRANSCRIPTS 

In the Working Example presented herein, transcription 

15 units within the 460 kb PKD1 interval, (FIG. 2) defined in 

Section 6, above, were identified. The interval was found to 

have a maximum of 27 transcriptional units (TU) , which 

contained a total of approximately 3 00 kb. 

2Q 7.1 Materials and Methods 

cDNA library screening : cDNA libraries were prepared from 
several sources including EBV transformed lymphocytes, 
teratocarcinoma tissue, fetal kidney and HeLa cells. In ' 
addition a human adult kidney library was purchased from 

25 Clontech Inc. (San Diego, CA) . 

Total RNA from each tissue was prepared by the 
guanidinium chloride method. First strand cDNA synthesis was 
prepared using random six base oligonucleotides by the method 
of Zhou et al, Journal Biol. Chem. , 267 : 12475 (1992). EcoRI 

30 sites within the cDNA were blocked by DNA methylase. The 
cDNA was flush-ended with T4 kinase and EcoRI linkers were 
added with DNA ligase. The cDNA was cleaved with EcoRI and 
ligated into either bacteriophage lambda-gtlO or lambda-ZAP 
(Stratagene) . The phage were packaged with high-efficiency 

35 packaging extract (Stratagene) . At* least one million primary 
clones were plated. The library was amplified 100 -fold and 
stored at 4° C. 
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At least 500,000 plaques of each library were screened 
with each cosmid clone at a density of 25,000 per 75mm 
diameter plate. Duplicate filter lifts were made of each 
plate (Ausubel, supra ) . The radiolabelled probes were 
5 incubated with an excess of unlabelledr denatured human DNA 
and then added to the library filters in a sodium phosphate 
buffer at 65° C. for 16 hours. The filters were washed in 
2xSSC at 65° C. for 1 hour and O.lxSSC, 0 . IxSDS at 65° C. for 
one hour. Kodak XAR-5 was exposed to the library filters for 
10 4-16 hours. Duplicate positives were picked and replated at 
a density of approximately 100-500 per plate. Filter lifts 
of these secondary plates were made and hybridized as for the 
primary lifts; pure isolated plaques were obtained and 
inoculated into 50ml cultures and the phage DNA was purified. 

15 

Sequencing techniques : Techniques were as described in 
Section 6*1, above. 

7.2 Results 

20 To identify transcribed sequences within the PKD1 

interval (FIG. 2) , the cosmid and phage clones from the 
interval were hybridized to cDNA libraries made from a 
variety of human tissues including fetal and adult kidney, 
teratocarcinoma, adult liver, lymphoblast, HeLa, and adult 

25 brain." More than 100 hybridizing cDNA clones were 

identified. These clones were subcloned into pBlueScript 
plasmids and sequenced. The sequence data combined with 
hybridization data (between cDNA clone and genomic clone) 
allowed the cDNA clones to be assigned to a maximum of 27 

30 transcription units, as described below. 

Namely, hybridization between two cDNA clones was 
evidence that the clones are part of the same transcription 
units. Similarly, sequence identities of greater than 25bp 
between the cDNA clones were used as evidence that the clones 

35 were part of the same transcription unit. 

Table 2, below, lists these units (a-z, aa) by the name 
of the longest clone. 



- 70 



WO 95/34573 



PC17US95/07079 



Table 2 



Putative Transcriptional Unit 
Sequences Isolated From the PKD1 Region 



CANDIDATE GENES IN THE PKD1 REGION 
Insert _ 
Size 





Clone 


(kb) 


cDNA Libraries 


Motif 


a. 


20.7 


2.1 


cy, terat 




b. 


SazD 


2.7 


cy 


G-protein $ subunit-like 


c. 


SazB 


2.2 


cy, terat 


scERV from yeast 


. d. 


SazlO 


4.0 


cy, lym 




e. 


Sazl3 


1.5 


cy, terat 


tandem 120 amino : acid 










repeat; Z01 - family 


f. 


Saz20 


5.5 


cy, lym, terat 




g- 


KG8 


3.4 


lym 




h. 


NKG9 


1.8 


lym 




i. 


NKG10 


2.8 


lym 




j. 


NKG11 


2.4 


lym 




k. 


Nik4 


0.9 


kid 




1 


Nik7 


2.3 


lym, terat 


rab £?ene motif 


m. 


KG3 


3.8 


terat cv 


G-orofpin B ^lihnnif-lik'ft 

VJ I J 1- \Jir\s 11 A ^» pj LI lv IAA /.I 4. JltAV 


n 
11. 


INIa-7 


0 7 




allKyilll icpcai 


u. 




0 6 


kid 




n 


KM 17 


1.6 


terat cv 


VJ L> L \J tw i-li JUU LA-1. JUL I 


q- 


NiklO 


1.6 


lym 




r. 


KG5 


2.6 


cy 


zinc-fmger protein 


s. 


KG1 


1.1 


kid 


DNase 


t. 


KG6 


3.4 


kid, cy, lym 


human homolog of 










mouse RNSP1 gene 


u. 


Nik3 


3.2 


terat, lym, cy 


* 


V. 


Nik2 


3.4 


terat, lym, cy 


* 


w. 


Nikl 


0.8 


kid 


* 


X. 


Nik8 


1.6 


lym 


* 


y. 


KG17 


2.2 


lym 




2. 


AJ1 


1.4 


cy 


cyclin-F homolog 


aa. 


MAR1 


2.0 


kid 


MDR-like 



10 



15 



20 



25 



3.0 



35 



* u, v, w, x are part of an 8kb transcriptional unit (nik 823) which produces a MDR-like channel. 
MAR1 is another member of the gene family. ATP-dependent transporter cyclin proton-channel of 
vacuolar proton ATPase 

cDNA library from which the clone was obtained: cy = cyst; terat =teratocarcinoma; 
lym==lymphoblast; kid = kidney 
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Thus, these 27 transcription units were considered by virtue 
of their genomic localization to be candidate genes for PKD1 . 
The total transcribed cDNA in the 27 transcription units 
equalled about 60kb- 
5 The sequence of each clone was compared with sequences 

deposited in the public databases Genbank, EMBL, and 
SwissProt. Several of the cDNA clones contained sequences 
predicted to code for known protein motifs. Because so 
little was known of the molecular basis of ADPKD none of the 
10 candidate genes could be ruled out by virtue of sequence 
motifs. 

8. PKD1 INTERVAL NORTHERN ANALYSIS 
In the Working Example presented herein, an analysis of 
15 the transcriptional expression patterns of the TUs described, 
above, in Section 7, was conducted. 

8-1 MATERIALS AND METHODS 
Northern blot analysis : Poly A+ RNA (2fxg) from heart, brain, 
20 placenta, lung, liver, skeletal muscle, kidney and pancreas 
was hybridized with radio-labelled cDNA probes from the TUs 
within the PKD1 interval, under standard conditions. 

8.2 RESULTS 

25 inserts from the cDNA clones of the TUs described in 

Section 7, and listed in Table 2, above, were used to probe 
Northern blots containing total RNA and polyA- enriched RNA 
from normal human organs and from between 8 and 10 kidneys 
removed from patients with ADPKD. 

3 0 The expression .profile was compared with the pattern of 

pathology in ADPKD to determine a priority for further 
characterisation. The Northern analysis demonstrated that 26 
of the TUs in the PKD1 interval were expressed in kidney, 
with the exception of Nik9 . Nik9 mRNA was found to be 

3 5 abundant in human brain but expressed at very low level in 
fetal and adult human kidney. These data, therefore, 
indicated that Nik9 is not the PKD1 gene . No consistent 
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differences were observed between normal and ADPKD kidneys 
for any transcript . 

9* EXAMPLE: PKD1 INTERVAL MUTATION SCREENS 
5 A systematic search was undertaker* to detect mutations 

in ADPKD patients in the transcribed regions listed in Table 
2 . The mutation screen used several independent techniques . 
Southern blot analysis of patient DNA digested with at least 
three different restriction endonucleases was performed. 
10 Several differences between the restriction patterns were 
detected but none was found only in patients with ADPKD, 
Single-stranded conformational polymorphism analysis was 
carried out using cDNA isolated from patient transformed 
lymphocytes as a template. A large number of allelic 

15 differences was found but none were found to alter the 

deduced product of transcription. Sequence analysis of the 
KG5 cDNA was carried out in seven ADPKD patients and one 
normal . The deduced coding region of 2 . 6kb was sequenced 
using cDNA, made by reverse transcription from patient 

20 transformed lymphocyte mRNA, as a template. The cDNA was 
amplified by PGR in a series of overlapping sections and the 
PCR products were sequenced. No sequence differences were 
detected between patients and normal individuals. In this 
way more than 80% of the coding DNA in the transcription 

25 units was scanned and no mutations were found in PKD1 

patients . These experiments excluded the scanned - segments of 
the transcription units with a likelihood of 95% based on the 
reasonable assumption that no ADPKD mutation accounts for 
>70% of all ADPKD cases. 

3 0 Thus, the following transcription units were excluded: 

sazB, sazD sazl3, KG3 , KG5, KGI , saz20, KM17, Nikl, Nik2 , 
Nik3, Nik8, KG17, Nik7, MAR1 . These excluded transcripts 
represent >80% of the combined identified coding sequences in 
the PKD1 region. 

3 5 it has previously been noted that de novo mutation to 

ADPKD accounts for at least 1% of cases. Two mechanisms have 
been shown to account for the vast majority of new mutation 
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rates of this order. First, the coding region may be large. 
Duchenne muscular dystrophy (DMD) provides an example of this 
situation: the dystrophin gene which is mutated in DMD has a 
transcript of approximately 14kb. About 30% of DMD cases 
5 arise by de novo mutation. The second- mechanism that may 
account for a high new mutation rate is the presence of an 
unstable repetitive element. Unstable trinucleotide repeats 
in which the repeat sequence contains >50% C and G have been 
shown to cause the fragile X syndrome, Huntington's disease 
10 and myotonic dystrophy. In two of these diseases, high 

mutation., rates or the appearance of progressively more severe 
disease in successive . generations (anticipation) have been 
documented. 

A systematic search for trinucleotide repeats in the 

15 PKD1 interval was undertaken. Single - stranded probes (15-25 
nucleotides) containing all possible combinations of 
trinucleotide repeats were synthesized, radiolabelled and 
hybridized to Southern blots containing the complete set of 
clones comprising the PKD1 interval. The hybridization and 

20 washing conditions were adjusted to allow detection of all 
perfect repeats of 15 nucleotides or more. Eight separate 
banks of trinucleotide repeats within the PKD1 interval were 
found. Primers were designed so that the trinucleotide 
repeat arrays could be amplified by PCR and size -fractionated 

25 on polyacrylamide gels. No differences were found between 
ADPKD patients and controls. 

Additionally, two other screening methods were attempted 
for the identification of trinucleotide expansions in the 
PKD1 interval . Southern blots of DNA from normal and 

3 0 affected individuals was probed with inserts containing the 
repeats. This revealed no polymporphisms , Further, multiply 
restricted DNA samples (Rsa/Sau3A/Hinf 1) samples were probed 
with trinucleotide repeat oligonucleotides. Though myotonic 
dystrophy and fragile-X. mental retardation patients could be 

35 identified via such methods, it was not "possible to identify 
any common pattern in ADPKD patients . 
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The cDNA clones Nikl, Nik2 , Nik3, and NikS were found to 
hybridize to an 8kb transcript present in kidney. These 
clones were assumed to be part of the same transcript . PGR 
product that bridged the three gaps in sequence between the 
5 four clones were obtained using primers based on sequences 
within the four cDNA clones. In this way approximately 8kb of 
the transcribed DNA sequence of the gene represented by Nikl, 
Nik2, Nik3 , and Nik8 was obtained. Because the coding region 
is large the gene was expected to have a high spontaneous 

10 mutation rate and therefore to be a good candidate for the 
PKD1 gene. A detailed exon-by-exon search of the gene, 
however, revealed no evidence of mutations in ADPKD patients. 
This left only one TU within the region which was considered 
large enough to be a reasonable candidate for the PKD1 gene. 

15 The characterization of clones and sequences within this TU, 
part of the putative PKD1 gene, is described, below, in the 
Working Examples presented in Sections 10 and 11. 

10 . EXAMPLE: SSCP Analysis of ADPKD Patients 
20 In the Working Example presented herein, an SSCP 

analysis of genomic DNA amplified from DNA derived from 
normal and ADPKD patients was conducted which identified 
ADPKD-specif ic allelic differences which map to the single 
gene of the PKD1 interval which was described, above, in the 
25 Working Example presented in Section 10. 

10.1 Materials < and Methods 
SSCP Analysis : Single-Stranded Conformational Analysis 
(SSCP) was performed as follows: 50ng of genomic DNA was 

30 amplified by PCR under standard conditions in a reaction 

volume of 20 fil . Ten microliters of the amplified product was 
added to 90 /xl of formamide buffer, heated at 97°C for 4-5 
minutes, and cooled on ice. Four microliters of the reaction 
mixture was loaded on a polyacrylamide gel (10%, 50:1 

35 acrylamide :bisacrylamide) containing 10% glycerol. The gel 
was run at 4°C for 12. hours with 10W power in 0.5 X TBE 
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buffer. The gel was dried and exposed to a Molecular Dynamic 
Phosphor- Imager screen for 4 to 16 hours. 

Intron/Exon Mapping : Primers produced from cDNA clones were 
5 used to PCR amplify genomic DNA sequences. Amplified 
products were sequenced, using standard methods. Those 
sequences which differed from the cDNA sequences indicated 
intron sequences. 

10 PCR Amplification : Procedures for amplification were as 
described, above, in Section 6.1. 

10.2 Results 
Because the large size of the putative 
15 KG 8 / NKG 9 / NKG 10/ NKG 1 1 transcript makes it a likely site for 
mutation, the intron/exon structure of part of the gene 
represented by KG8 and NKG 9 was determined so that an exon- 
by-exon search for mutations could be conducted. The 
exon/intron structure analysis allowed PCR primers to be 
20 designed for the amplification of several exons of the PKD1 
gene . 

These primers were used to PCR-amplify genomic DNA and 
to perform SSCP of ADPKD patients and normal individuals . In 
two ADPKD patients SSCP patterns were observed that showed 

25 allelic differences. Both patients were heterozygous for an 
SSCP variant that was not seen in a large number of normals 
from the normal population (Fig 3<A-3B) . In samples from 
these two individuals, 4 bands are visible, instead of the 2 
single-strand bands seen in samples from normal individuals. 

30 The 4 bands are of equal intensity and are presumed to 

comprise two allelic sense strand and two allelic antisense 
strands . 

Thus, the results discussed in this Example, coupled 
with the analyses reported, above, in the Examples presented 
35 in Sections 6 through 9 provide positive correlative evidence 
that the gene corresponding to the putative transcription 
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unit of which the clones KG8, NKG9, NKG10 and NKG11 are 
believed to be a part, is the PKD1 gene. 

11. EXAMPLE: MOLECULAR CHARACTERIZATION OF THE PKD1 GENE 
5 In this Example, the complex structure of the PKDl gene 

and gene product is described. Included herein is a 
description of the PKDl gene structure, the nucleotide 
sequence of the entire coding region of the PKDl trancript, 
as well as the amino acid sequence and domain structure of 

10 the PKDl gene product. This description not only represents 
the first elucidation of the entire PKDl coding sequence, but 
additionally also corrects errors in the portionof the PKDl 
coding region which had previously been reported- Also, a 
AOPKD-causing mutation within the PKDl gene which results in 

15 a frameshift is identified. Further, the strategy utilized 
to characterize this extensive and difficult nucleic acid 
region is summarized. 

A portion of the nucleotide sequence corresponding, in 
large part, to the 3' end of the PKDl gene had recently been 

20 reported (European Polysystic Kidney Disease Consortium 
[hereinafter abbreviated EPKDC] , 1994, Cell 77:881-894) . . 
Specifically, the terminal 5.6 kb of the PKDl transcript were 
studied and an open reading frame of 4-8 kb was reported. 
The peptide this putative open reading frame encodes, which 

25 would correspond to the carboxy terminal portion of .the PKDl 
protein, did not reveal any homologies to known proteins and, 
if this derived amino acid sequence was, in fact, part of the 
PKDl protein, its sequence did not suggest a function for the 
PKDl' gene product. 

30 For this lack of revealing information, in addition to 

the fact that only a small percentage of ADPKD-causing 
mutations appear to reside within the 3' end of the PKDl 
gene, the characterization of the 5' end of the gene and a 
more complete analysis of the PKDl gene and gene product were 

35 greatly needed. 

As acknowledged by the EPKDC (EPKDC, 1994, Cell 77:881- 
894) , however, the elucidation of the complete PKDl coding 
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sequence presents major problems. Unlike the 3' end of the 
PKDl gene, the 5' two- thirds of the gene appear to be 
duplicated several times at other genomic positions. 
Further, at least some of "these duplications are transcribed. 
5 Thus, great difficulties arise when attempting to distinguish 
sequence derived from the authentic PKDl locus apart from 
sequence obtained from the duplicated PKDl-like loci. 

11.1. MATERIALS AND METHODS 

10 11.1.1. GENOMIC CLONES 

3Jhe. human PI phage named PKD 1521 was isolated from a 
human-rsPl library using primers from the adjacent TSC2 gene. 
The first screen utilized primers F33tcttctccaacttcacggctg, 
R32aaccagccaggttttggtcct , followed by F38caagtccagctcctctccc , 

15 R40gctctttaaggcgtccctc and ultimately screened with primers 
in the KG8 gene (F9/R5) see page 68 for KG8-R5 5' primer, 
while KG8-R5 5' gcgctttgcagacggtaggog 3'., The cosmid cGGGlO 
has been previously described (Germino, G.G., Weinstat- 
Saslow, D., Himmelbauer, H. # Gillespie G.A.J* , Somlo, S., 

20 Wirth, B., Barton, N., Harris, K.L., Frischauf, A.M. and 
Reeders, S.T. (1992) Genomics, 13:144-151). The cosmid 
cGGGlO was mapped using various restriction enzymes as 
described by the manufacturers. A random library of the 
cosmid was constructed by cloning sheared DNA fragments into 

25 the Smal.site of pUC 19- Initial sequence assembly for the 
cosmid:::cGGG10 was performed on forward and reverse sequences 
of approximately 1000 random cloned fragments and a 
preliminary map was constructed using the restriction map of 
the cosmid. Directed subclones of cGGGlO were made in the 

30. plasmid pBluescript (Stratagene) in order to create 

sequencing islands specific physical locations. These large 
subclones from cGGGlO were then restricted with more frequent 
cutter enzymes and cloned into M13mpl9 and mpl8 . In 
addition, if gaps were found in cloned regions, directed 

35 sequencing was performed from the flanking regions, to join 
the anchored contigs. A contig of 34.3 Kb was constructed, 
with two gaps in what appear to be highly repetitive regions 
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with no identifiable coding sequence* cDEBll was has been 
described previously (Germino, G.G., Weinstat-Saslow, D., 
Himmelbauer, H., Gillespie G.A.J. , Somlo, S. , Wirth, B., 
Barton, N., Harris, K.L., Frischauf, A.M. and Reeders, S.T, 
5 (1992) Genomics, 13:144-151). A random library was 

constructed with sheared cDEBll DNA and cloned into the Smal 
site of pUC19. This cosmid was sequenced to obtain at least 
2 -fold coverage. 

The sequencing was done by cycle sequencing and run on 

10 ABI machines following the manufacturer's instructions with 
modifications as described below. Because of the difficulty 
of sequencing certain regions, the standard chemistry of 
sequencing used withthe ABI machines had to be modified. 
Both dye terminator and dye primer sequence were used when 

15 appropriate with sequencing different regions. Different 
polymerases and different melting and polymerization 
conditions were also used in order to optimise the quality of 
the sequence . When sequencing across the CpG island at the 
5* end of the PKD1 gene, the best sequencing results were 

2 0 obtained when adding 5% DMSO to the polymerization step and 
sequencing single-stranded templates. 

11.1.2. cDNA LIBRARY SCREENING 
The first cDNA used to screen libraries was KG8 , which 

25 maps to the unique region of the PKD1 locus and was ..recovered 
from an adult lymphocyte libary, . In order to complete the 
rest of the PKD1 transcript, fourteen new cDNAs were 
sequenced to completion, four cDNAs were partially sequenced 
and an additional 20 cDNAs were mapped against cGGGlO. 

30 Additional data was obtained from RT-PCR products of the 
renal cell carcinoma cell line SW83 9 (ATCC) . 

Overlapping partial cDNAs described below were isolated 
from lymphocyte and fetal kidney libraries. In this way, a 
14 kb transcript was assembled starting from the 3 ' until the 

35 CpG island was reached. It is assumed that the 5 'end of the 
PKD1 trancript has been located- No other clones further 
upstream were recovered upon further screening those cDNA 
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libraries that had provided the majority of the cDNAs which 
were used to assemble the full length PKD1 cDNA. 

The cDNAs FK7 and FK11 were recovered from a fetal 
(gestation age of 14-16 weeks) kidney cDNA library using KG 8 
5 cDNA as a probe. This library was constructed with the 

Superscript Lambda System from (Gibco/BRL) , using oligo d(T) 
primed cDNA. FK7 and FK11 were recovered as SA1I inserts. 
The cDNAs designated BK156, BK194 , UN49 and UN52 were 
recovered from a lymphocyte cell library and pulled by using 
10-FK7 as a probe. UN34 was recovered from the same library by 
hybridizing with a Seal-Sail 5 • end probe of FK7 . UN53, UN54 
and UN59 ..were recovered from the same lymphocyte library ( M 
Owen 1-aboratory, ICRF; Dunne, PhD thesis, 1994) by double 
screening clones that were both negative when screening with 
15 an FK7 probe and positive when screening with BK156 and UN52 
The cDNA NKG11 was recovered from a lymphocyte library 
screened with cGGGlO and was described previously (Germino, 
G.G., Weinstat-Saslow, D . , Himmelbauer, H. , Gillespie G.A.J , 
Somlo, S . f Wirth, B., Barton, N. , Harris, K.L., Frischauf, 
20A.M. and Reeders, S.T. (1992) Genomics, 13:144-151). ). The 
cDNA named Fhkb21 was obtained from a Clonetech fetal kidney 
library using BK156 as a probe. MSK3 was obtained by probing 
an adult kidney library (Clonetech) with a probe from 5 'end 
of KG8. MSK4 was obtained by nested RT^PCR from primers 
25 spanning from exons 7-8 to exons 13-14,' followed by second 
round of PCR with internal primers in exon 8 and exon 13. 

11.1.3. CDNA SFQUENCTNG 
■ The cDNAs were sequenced to 5-fold coverage by primer 

30. walking and/or subloning small fragments into M13 or 

pBluescript. All cDNA sequences were compared to the cGGGlO 
cosmid sequence to assess whether they were from the correct 
locus and to determine intron/exon boundaries. Discrepancies 
were resequenced to determine whether the differences were 

35 genuine. Some of the cDNAs described above were clearly 
different from the genomic sequence, suggesting that these 
cDNAs were encoded by another locus. 

- 80 - 



WO 95/34573 



PCTYUS95/07079 



MSK3 , FK7 and FK11 were obtained using a PKDl-specif ic 
probe (KG8) were found to be 100% identical to genomic 
sequence. The cDNA and UN4 9, which showed 99% identity, is 
possibly PKDl-specif ic. BK241, BK194, UN52, UN53, UN54 and 
5 UN59, BK156, Fhkb21 and NKG11 were 96-^98% homologous to the 
cGGGlO defined exon sequence, and thus were assumed to have 
originated fromt the duplicated loci. In general, 
differences between genomic cDNA were nucleotide differences 
scattered through out the cDNA sequence. One exception is 

10 BK194, which has an extra CAG at position 1863 of the 
previously published partial sequence and arose f rom : - 
alternative splicing of exon 33. Another exception is BK241 
that has an insertion of the following sequence in a tandem 
repeat of TTATCAATACTCTGGCTGACCATCGTCA at position 1840 of 

15 the previously published sequence (European PKD1 Consortium) . 
This sequence was not included in the authentic, full-length 
PKD1 cDNA because it arose from the duplicated loci would 
produce a frame shift in the ccoding region of the PKD1 
transcript. Except for BK241, cDNAs in the UN and BK series 

20 that overlap with each other are more identical to themselves 
than to the genomic sequence. 

All sequence assembly was performed using the Staden 
package XBAP (Dear, S. and Staden R. (19 91) . Nucleic Acid 
Res. 19:3907-3911.) 

25 

11.1.4. PROTEIN HOMOLOGY SEARCHES 
The PKD1 derived amino acid sequence was subjected to 
various sequence analysis methods (Koonin, E.V., Bork, P. and 
Sanders, C. (1994) Yeast chromosome III: new gene functions. 

30 EMBO 13:493-503). For identifying homologues, initial 
(SWISSPROT, PIR, GENPEPT, TREMBL, EMBL , GENBANK, NRDB) 
database searches were performed using the blast series of 
programs (Altschul, S.F. and Lipman, D.J., 1990, Proc . Natl. 
Acad. Sci. USA 87:5509-5513) by applying filter for 

35 compositionally biased regions. (Altschul, S.F. et al M 

1994, Nat. Genet, 6:119-129). By default, the BLOSUM62 amino 
acid exchange matrix was used {Henikoff, S. and Henikoff J.G. 



- 81 - 



WO 95/34573 



PCTYUS95/07079 



(1993), Proteins 17:97-61). In order to reveal additional 
candidate preoteins that might be homologous to PKD1, the 
BL0SUM4 5 and PAM240 matrices were also applied. Putative 
homolgoues with a blast p-value below 0.1 were studied in 
5 detail. Multiple alignments of the candidate domains were 
carried out using CLUSTALW (Thompson, J.D. , Higgins,. D.G. and 
Gibson, T. (1994), Nucleic Acid Res . 22:4673-4680) and 
pattern (Rohde, K. and Bork, P. (1993)*. Comput . Appl . 
Biosci. 9:183-189), motifs and profiles (Grisbskov, M., 

10 McLachlan, A.D. and Eisenberg, D. (1987) Proc. Natl. Acad. 
Sci. JUSA, 84 :4355-4358 were derived. With all these 
constructs interactive database searches were performed. 
Results of these database searches were used for improving 
the multiple alignments that were then used for the next 

15 round of database searches. The final multiple alignment 
containing all retrieved members of a module family was then 
used as input for the secondary structure predictions (Rost, 
B. and Sander, C. (1994). Proteins 19:55-872). 

20 11.1.5. SSCP ANALYSIS 

Single-Stranded Conformational Analysis (SSCP) was 
performed as follows: 50ng of total genomic DNA was 
amplified by PCR. In addition to the genomic DNA, each PCR 
reaction contained 1 picomole of each primer (see below), 0.1 

25 fil 32 P^:dATP (Amersham) , 0.2 fil in AmpliTaq (Pharmacia), in PCR 
buff err-:- with a final Mg 2+ of 1.5 mM in a final volume of 20 fil . 
The amplification was performed for 25 cycles, each 
consisting of 94° C. for 30 seconds, 60° C. for 30 seconds, 
and '72° for 6 0 seconds. 

30 Intronic primers F25 and Mill-IR were utilized for the 

initial SSCP evaluation. The fragment amplified with these 
primers overlaps with the 5' end of KG8 . Subsequently, the 
primers F31 and R35 were ussed to amplify the fragment used 
to sequence the PKD1 mutation. 
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Primers : F25 ( 5 ' TCGGGGCAGCCTCTTCCTG 3 ' ) ; 

Mill - 1R ( 5 ' TACAGGGAGGGGCTAGGG 3 ' ) ; 

F31 (5' TGCAACTGCCTCCTGGAGG 3') 

R35 (5' GGTCTGTCTCTGCTTCCC ,3') 

One microliter of each sample was diluted into loading 

5 dye (95% formamide, 20 mM NaOH, 1 mM ISDTA, xylene cyanol, 

bromophenol blue) denatured at 98°C for* 5 minutes, cooled on 

ice and loaded onto a 10% (50:1 acrylamide : bisacrylamide) 

polyacrylamide gel containing 10% glycerol. The gel was run 

at 4°C., 50 watts, for 3 hours- Exposure was overnight on 

10 phosphoimager plates . 

Amplified DNA from the one individual with a variant 

pattern was then reamplified using KG8-F31 and .KG8-R35 

primers and the above-described P'CR conditions. Both 

reamplified strands were then sequenced using standard 

15 procedures for cycle sequencing of PGR products. 32 P-dCTP 

incorporation was used, 

11.2 RESULTS 
A series of overlapping cosmid clones spanning the 

20 predicted PKD1 region has been described (Germino, G.G., 
Weinstat-Saslow, D. , Himmelbauer, H. , Gillespie G.A.J, , 
Somlo, S., Wirth, B., Barton, N., Harris, K.L., Frischauf, 
A.M. and Reeders, S.T- (1992). Genomics, 13:144-151). The 
integrity of the cosmid contig was confirmed by long-range 

25 restriction mapping and genetic linkage analysis -of., 
polymorphic sequences derived from the cosmids. Three 
cosmids (cGGGl, cGGGlO and cDEBl,l, from centromere to 
telomere) form a contig that includes the 3' end of the 
adjacent gene, TSC2, (cDEBll) and spans -over 80 kilobases 

30 centromeric. At the proximal end of cGGglO, there is a CpG 
island represented by the Not I site, N54T (Himmelbauer; FIG. 
Z1A) . 

In order to identify transcripts from the region, the 
cosmid clones were hybridized to a set of five cDNA 
35 libraries. KG8, a cDNA corresponding to the distal 3.2kb of 
the PKD1 sequence (which is located on cosmid cDEBll) , was 
mapped using a panel of somatic cell hybrids, and found to 
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hybridize to a single locus on chromosome 16pl3 . Sequence 
analysis confirmed that KG8 contains the polyadenylated 3' end 
of a gene and has an open reading frame (ORF) of 2100 bp and 
a 1068 bp 3' untranslated region. KG 8 was also found to 
5 contain a polymorphic (CA) microsatellite repeat (Snarey) . 
Analysis of this repeat in a large number of PKD1 kindreds 
revealed no recombination (Solmo) . 

To obtain clones extending 5' of KG8 , the cosmids cGGGlO 
and cDEBll were hybridized to different cDNA libraries. When 

10 some of the positive clones obtained from these screens were 
analyzed^ using somatic cell hybrid panels, they were found to 
hybridize strongly to several loci on chromosome 16 in 
addition to the PKD1 region. The restriction maps of the 
. hybridizing loci were so similar that it was concluded that a 

15, series of recent duplications of part of the PKDl gene had 
occurred (excluding the PKD1 region from which the KG 8 cDNA 
is derived) which had given rise to several . PKD1 -like genomic 
segments. This sequence duplication had been reported by the 
European PKD1 Consortium () . Preliminary sequence 

20 analysis of the cDNA clones revealed that the PKD1 and PKD1- 
like loci give rise to two or more transcripts sharing 95-98% 
sequence identity. Because of the high degree of similarity 
between PKD1 and PKDl-like transcripts, therefore, it was not 
possible to determine the ccorrect full -length PKDl cDNA 

25 sequence, -by simply assembling overlapping partial cDNA 
clones . _ 

To begin to determine the seguenceof the authentic PKD1 
-transcript, therefore, it was concluded that genomic PKD1 
sequence should be compared to that of the PKD1 specific and 
30 ; .PKDl-like cDNAs homologous to the genomic sequence. To that 
end, the entire v cGGG10 cosmid and PKDl exon- containing parts 
of the cDEBll cosmid were sequenced, as described below. 

11.2.1 SEQUENCE OF THE GENOMIC REGION OF THE PKDl LOCUS 
35 The duplicated portion of the PKDl gene is largely 

contained within the cosmid cGGGlO. Prior to sequencing 
cGGGlO, the integrity of the clone was established in several 
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ways. First, the restriction map of cGGGiO was compared with 
map of the genomic DNA from the PKDl region- Second, 
restriction maps of the overlapping portions of cGGGl and 
cDEBll were compared with cGGGiO. Third, sequences derived 
5 from cGGGiO and overlapping portions o£ cDEBll showed 100% 
similarity. Finally, a PI phage, PKD1521, was obtained by 
screening a genomic PI library with primers from the TSC2 
gene, which maps near the PKDl gene. No sequence differences 
were obtained between PKD 1521 and cGGGiO . 

10 It was necessary to pursue several approaches to obtain 

the sequence of cGGGiO (see Section 11.1, above). "Briefly, 
due to the difficulty of sequence certain regions, 
modifications to standard automated sequencing chemistries 
had to be made.. Both dye terminator and dye primer sequence 

15 was used, when appropriate, with several different regions. 
Further, different polymerases and different meltng and 
polymerization conditions were necessary to optimize the 
quality of the nucleotide sequence. When sequencing across 
the CpG island at the 5' end of the PKDl gene, in addition to 

20 modifying the polymerization step, single -stranded templates 
were used, 

A final ten fold redundancy was achieved for the cGGGiO 
cosmid in order to be able to accurately compare the genomic 
sequence with that of the PKDl specific and PKDl -like cDNAs 

25 homologous to this cosmid. The cGGGiO sequences were 
assembled into three contigs of 8 kb, 23 kb and 4.4 kb, 
separated by 1 kb and 2,2 kb gaps/. A two- fold redundancy was 
obtained for the cDEBll cosmid, whose sequence was compared 
to PKDl locus specific cDNAs in order to obtain intron/exon 

30 boundaries of the unique 3 7 end of the PKDl gene. 

11.2.2. PKDl and PKDl -LIKE cDNAs 
In order to identify putative coding regions and 
intron/exon boundaries, genomic and cDNA sequences were 
35 compared. cDNA clones had been identified in two ways. 

First, fragments of cosmids cGGGiO and cDEB were hybridized 
to five cDNA libraries. Second, each cDNA clone was 
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hybridized to fetal kidney and lymphocyte cDNA libraries to 
obtain overlapping clones with which to extend the sequence 
(FIG. Z1B) . 

When the sequences of overlapping cDNAs were assembled, 
5 a PKD1 trancript length of 14.2 kb was. obtained. The 

predominant transcript detected by Northern analysis using 
the unique sequence KG8 probe is approximately 14 kb, 
suggesting that the cDNA clones represent the full-length of 
the PKD1 trancript. 

10 Restriction and sequence analyses indicate that a CpG 

island . overlaps the 5' end of the sequence. CpG islands hae 
been :: found to mark the 5' ends of many genes (Antequera) . 
Further, the most 5' cDNA clones (UN53, UN54 and UN59) each 

~ f have identical 5' ends, providing additional evidence that no 

15 upstream PKD1 exons were missed (see Section 11.1, above). 

The multiple cDNAs used to assemble the PKD1 trancript 
along with the genomic sequence are shown in FIGS. 1A and IB. 
By comaring. the sequences of overlapping cDNAs and analyzing 
the degree of homology between the different cDNAs and 

20 genomic sequence, it was possible to distinguish cDNAs 

encoded by the authentic PKD1 locus frm those encoded y the 
homologous loci {see Section 11.1, above). The full length 
PKD1 trancript constructed from these exons produces a large 
continuous open reading frame of 12,902 bp. 

25 .—-^i^if icant sequence heterogeneity ws observed in these 
cDNAs-, suggesting that some level of alternative splicing of 
the --primary PKD1 transcript occurs. For this reason, it was 
z _ sought to isolate a minimum of two cDNAs containing each 
exon, in order to increase the probability that all exons 

3 0 contributing to the PKD1 transcript were detected. Formally, 
however, it remains possible that there exist PKD1 
transcripts which ccontain exons that are not present in the 
cDNA clones samples here. 

Exon 17 was found in two cDNA clones (UN34 and BK156) 

35 and in the cosmid sequence, but the* exon was not incorporated 
into the final PKD1 transcript. This is due to a number of 
reasons. First, the cDNA clones in which this exon is found 
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differed from the cosmid and are likely to represent PKD1- 
like genes, rather than the authentic PKD1 gene (see Section 
11.1, above). Second, this exon is not found in FK1, a cDNA 
which was cloned using a PKDl-specif ic probe (KG8) . Finally, 
5 when included in the full-length cDNA,^ this exon introduces a 
stop codon (743 nucleotides downstream of exon 17) that would 
producce a truncated protein of 265,1 amino acid residues. 
Further studies are needed to assess whether this exon may be 
used in diffferent splice combinations in locus -specific 

10 trancripts. An ADPKD patient with a heterozygous mutation 
which introduces a stop codon at position 10,601 of the -PKD1 
open reading frame. Other mutations tha truncate the -PKD1 
protein have also been reported by the European PKD1 
Consortium. Therefore, it is unlikely that transcripts which 

15 include exon 17 are predomiant forms in the kidney. 

H-2.3. SEQUENCE ANALYSIS OF THE PREDICTED PKD1 PROTEIN 
The assembly of 4 6 PKD1 exons yields a predicted 
transcript is 14.2 kb in length with 228 bp nucleotides of 

20 putative 5' untranslated and 790 nucleotides of 3' 

untranslated sequence. The authentic PKD1 transcript differs 
from the reported 3' PKD1 sequence (EPKDC, 1994, Cell 77:881- 
894) due to the presence of two extra cytosines at position 
12873 of the PKD1 open reading frame {corresponding to PBP 

25 position 4563). This frameshift yielded an erroneous carboxy 
PKD1 derived amino acid sequence which contained almost 80 
additional amino acid residues. /The presence of the two 
extra cytosines as confirmed with the cosmid sequence derived 
from cDEBll. 

30 The PKD1 protein derived from the assembled PKD1 

transcript is 4304 amino acids in length, with a predicted 
molecular weight of 462 kilodaltons. The nucleotide sequence 
encompassing the Met-1 codon is CTAACGATGC , which represents 
an uncommon translation start site (Kozak, M. (1984) , 

35 Nucleic Acids Res. 12:857-872). This methionine was 

determined to be the putative PKD1 translation start site 
because it is preceded by an in-frame stop codon 63 bases 
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upstream. Furthermore, the PKD1 coding region begins with a 
23 amino acid region which exhibits many of the properties of 
a signal peptide and corresponding cleavage site (von Hejne, 
G. (19B6) . Nucleic Acids Res. 14:4683-4690. Welling, L.W. 
5 Grantham, J.J. (1972). J. Clin. Invest. 51:1063-1075). 

In addition to the signal sequence, the identification 
of five domains that have been identified in other proteins 
and a newly discovered domain strongly suggests the 
extracellular location of at least the N-terminal half of the 

10 protein. Immediately downstream of the signal sequence there 
are two leucine-rich repeats (LRRs) (Figure 7) . These LRRs 
are flanked on both sides by a cysteine . rich regions which 
have homology to the flanking regions of a subset of other 
. LRRS. LRRs occur in numerous proteins and have been shown to 

15 be involved in diverse forms of protein-protein interactions. 
The number of LRR within the respective proteins varies 
between 2 and 29 (Kobe B. and Deisenhofer J. (1994) . Treds. 
Biochem. Sci. 19:415-421) . Adhesive platelet glycoproteins 
form the largest group in the LRR superfamily (Kobe B. and 

20 Deisenhofer J. (1994). Treds. Biochem, Sci. 19:415-421). . 
The structure of the array of 15 LRRs in porcine ribonuclease 
inhibitor (RI) has recently been crystallized (Kobe B. and 
Deisenhofer J. (1995). Nature 374:183-186); the LRRs of the 
RI protein form a horseshoe -like structure that surrounds 

25 RNase A (Kobe B. and Deisenhofer J. (1995). Nature 374:183- 
18 6) . It has been suggested that proteins containing only a 
few LRR, like the PKD1 protein, interact with other proteins 
via the LRRs in order to form the horseshoe -like 
superstructure for protein-binding (Kobe B. and Deisenhofer 

30 J. (1994) . 

Although LRRs occur in various locations in different 
proteins, the additional flanking cysteine-rich disulfide 
bridge- containing domains, define a subgroup of extracellular 
proteins (Kobe B. and Deisenhofer J. (1994) . Only a few 
3 5 proteins have been sequenced so far that contain both, the 
distinct N-terminal and C-terminal flanking cysteine-rich 
domains (Figures 7 and 8) . Among this group are toll, slit, 
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trk, trkB and trkC, which are all involved in cellular signal 
transduction. For example, the Drosophila toll protein is 
suspected to be involved in either adhesion or signaling 
required to mediate developmental events such as dorsal - 
5 ventral patterning {Hashimoto, C, Hudson, K.L., and 

Anderson, K.V. (1988). Cell 52:269-279). The Drosophila 
slit protein is thought to possible mediate interactions 
between growing axons and the surrounding matrix (Rothberg, 
J.M., Jacobs, J.R W Goodman, C.S., and Artavanis-Tsakonas, S. 
10 (1990). Genes and Dev. 4:2169-2187). In vertebrates, these 
domains are found in the trk family of tyrosine kinase 
receptors; these proteins may relay cell or matrix adhesive 
events to the cytoplasm via a small carboxy terminal kinase 
domain (Schneider, R, , Schweider, M, (1991). Oncogene 
15 6:1807-11). it is interesting to note that all of the 

proteins with these cysteine-rich domains are involved in 
extracellular function, many of which relate to cell 
adhesion. For example, the platelet glycoproteins I and V 
help mediate the adhesion of platelets to sites of vascular 
2 0 injury (Roth) . The 5T4 oncofetal trophoblast glycoprotein 
appears to be highly expressed in metastatic tumors. 

The PKD1 protein also contains a single domain with 
homologies to C-type (calcium-dependent) lectin proteins 
(Figures 7 and 8) . These domains are believed to be involved 
25 in the extracellular binding of carbohydrate residues for 
diverse purposes, including internalization of glycosylated- 
enzyme (asialoglycoprotein receptors), participation in 
extracellular matrix (versican) and cell adhesion 
(sel-ectins) (Weis) . The classification of C-type lectins has 
3.0 been based on exon organization and the nature and 

arrangement of domains within the protein (Bezouska) . For 
example, class I (extracellular proteoglycans) and class II 
(type II transmembrane receptors) all have three exons 
encoding for the carbohydrate recognition domain (CRD) ; where 
35 as in classes III (collectins) and IV (LEC-CAMS) the domains 
are encoded by a single exon. The CRD in PKD1 C-type lectin 
domain does not fit into the above classification because it 
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has a novel combination of protein domains and because it is 
encoded by two exons (exons 5 and 6, Figure 6) . Previous 
analysis has failed to establish a correlation between the 
type of carbohydrate bound to each C-type lectin and the 
5 primary structure of its CRD (Weis) . ^ 

Exon 10 encodes a LDL-A module (from amino acids 642- 
672, Figure 7) , a cysteine-rich domain of about 40 amino 
acids in length. This module was originally identified in 
the LDL-receptor (Sudhof) but it is also present 

10 extracellular portions of many other proteins, often in 
tandem arrays (Bork) (Figure 7). Because of their 
hydrophobic nature, these domains have been implicated as 
ligand-binding regions in LDL receptor-related pr (Krieger) . 
Other proteins, like the PKD1 protein, that contain a single 

15 or nontandem LDL-A, include the complement proteins 

(DiScipio, R.G., Gehring, M.R,, Podack, E.R., Kan, C.C. 
Hugli, T.E., and Fey., G.H. (1984) Proc. Natl. Acad. Sci . 
USA 81:7298-7302), calf enterokinase (Kitamoto, Y. , Yan, 
X.W., McCourt, D.W. and Sadler, J.E. (1994). Proc. Natl. 

2 0 Acad. Sci . USA 91:7588-7592) and a sarcoma virus adhesion 
protein. 

In addition to extracellular protein modules that have 
been recognized previously, the PKD1 protein a novel domain 
of approximately 70 amino acids in length, present in 14 

25 copies (Figures 7 and 8) . The first one is encoded by exon 5 
between the LRRs and the C-type lectin module. The other PKD 
domains are consecutively placed starting at amino acid 1100 
and ending at amino acid 2331 and contained in exons 13, 14, 
and 15. Profile and motif searches (see Section 11.1, above) 

30 identified several other extracellular proteins that also 

contain one or more copes of this novel domain, which we call 
the PKD domain. Whereas all known extracellular modules seem 
to be restricted to higher organisms, and the few exceptions 
seem to be evolutionary accidents (Doolittle), we found the 

35 PKD domain in extracellular parts of proteins from animals, 
eubacteria and archeabacteria . The animal proteins 
containing an individual PKD domain are heavily glycosylated, 
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melanoma-associated cell surface proteins, such as 
melanocyte-specif ic human pmell7 (Kwon BS . (1993) J. Invest. 
Derm. (Supplement) 100:134-140), the MMP 115 protein (Mochii, 
M., Agata, K. and Eguchi, G. (1991), Pigment Cell Res. 4:41- 
5 47), and the nmb protein (Weterman, MtA.J. , Ajubi, N. , van 
Dinter, I. Degen, W. , van Muijen, G. , Ruiter D.J, and 
Bloemers, H.P.J. (1995). Int. J. . Cancer 60:73-81). The 
physiological functions of these glycoproteins remains to be 
elucidated. Four enbacterial extracellular enzymers, three 

10 distinct collagenases and lysine-specif ic achromobacter 
protease I (API) also contain a single copy of the domain 
adjacent to their catalytic domains. Curiously, the highest 
degree of similarity between the collagenases is in the PKD 
domain. This may suggest that the domain in eukaryotic cells 

15 is involved in binding to collagenous domains. Four copies 
of the PKD domain are also present in the surface layer 
protein (SlpB) from methanothermus (Yao) . The SlpB protein 
is (as is the PMEL17 family) heavily glycosylated and is 
predicted to be a glycoprotein component of the surface 

2 0 layer. 

The PKD domain is predicted to be a globular domain that 
contains an antiparallel /3-sheet. Although the PKD domains 
do not contain conserved cysteines, we believe they are 
extracellular domains because: 1) all identified homologues 
25 are extracellular or the PKD domain is in the extracellular 
part; 2) the first domain (amino add 281-3 53) is located 
between other known extracellular modules; and 3) there are 
no predicted transmembrane regions between the other 
identified (extracellular) modules and the 13 remaining FKD 

3 0 domains. Whe reas the PKD domains in SlpB are very similar, 

pointing to rather recent duplication; the 14 domains in PKD1 
are rather divergent. Even the most conserved (WDFGDG) motif 
(Fig. 7) is considerably modified in some of the PKD domains. 
Therefore, it is unlikely that unequal recombination between 
3 5 genomic sequences for motifs is a common source of mutations 
in this disease. 
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Although, it was not possible to identify specific 
domains in the C-terminal half of the protein, a long region 
was found which contained similarity to a putative C\ elegans 
Chromosome III protein (accession number Z48544; Wilson) . A 
5 hydrophobic stretch of 60 amino acids from 3986 to 4045 might 
represent a possible transmembrane domain, but without any 
clear resemblance to other such domains . 

11.2-4. IDENTIFICATION OF AN 

ADPKD-CAUSING MUTATION 

10 

SSCP analysis was performed on samples obtained from 6 0 
patients, as described, above, in Section 10,1. One variant 
ADPKB individual was identified via SSCP. Upon 
reamplif ication of amplified DNA from this individual (see 

1S Section 10.1, above), it was revealed that the patient 

contained a C to T transition at base pair 10,601 (exon 32) 
of the full-length PKD1 transcript. This mutation created a 
stop codon (TAG) at PKD1 amino acid position 765 which 
previously coded for a glutamine (CAG) , thus truncating the 

2Q final 728 amino acid residues which are normally present at 
the carboxy end of the PKD1 protein and yielding a final 
mutant protein of 3576 amino acids. The mutation was also 
predicted to create a novel Sty-1 site (CCCTAG) ; genomic DNA 
spanning this exon was amplified as before from the patient, 

25 his parents, and over 60 other unrelated individuals (120 
alleles) . After Sty-1 digestion, only the patient ZC (#118) 
was heterozygous for an enzyme site. The absence of the 
sequence change in over 120 alleses establishes this is not a 
polymorphic variation. The absence of the site in either 

2Q parent establishes this as a new mutation, which corelates 
with the appearance of disease. Finally, the predicted 
impact on the protein (truncation) by itself is highly 
suggestive that it would impair or alter its function. This 
evidence, even in the absence of examination of the remainer 
of the gene or transcript in this patient, would be 
considered generally to be sufficient proof that this 
mutation is the cause of the disease. 
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12. DEPOSIT OF MICROORGANISMS 
The following microorganisms were deposited with the 
American Type Culture Collection, Rockville, Maryland on May 
27, 1994 and assigned the indicated accession numbers: 
5 Microorganism ATCC Accession No^ 

KG 8 59636 
CGGG10 69634 
cDEBll 69635 



10 The present invention is not to be limited in scope by 

the specific embodiments described which are intended as 
single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are within 
the scope of the invention. Indeed, various modifications of 

15 the invention, in addition to those shown and described 

herein will become apparent to those skilled in the art from 
the foregoing description and accompanying drawings. Such 
modifications are intended to fall within the scope of the 
appended claims. 



25 



30 



35 
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WHAT IS CLAIMED IS i 

1- An isolated nucleic acid containing a nucleotide 
sequence which encodes a polycystic kidney disease (PKD1) 
5 gene product . 

2. The isolated nucleic acid of Claim 1 which encodes 
the amino acid sequence (SEQ ID NO: 2) of the PKD1 gene 
product depicted in FIG. 6. 

10 

3... The isolated nucleic acid of Claim 1 wherein the 
nucleotide sequence is the nucleotide sequence (SEQ ID NO: 1) 
depicted in Fig. 6. 

15 4. The isolated nucleic acid Claim 1 which hybridizes 

under stringent conditions to the complement of the coding 
sequence of the nucleotide sequence depicted in FIG. 6 (SEQ 
ID NO: 1) , or which hybridizes under less stringent 
conditions and encodes a functionally equivalent PKD1 gene 

2 0 product* 

5 . A nucleic acid vector containing the nucleotide 
sequence of Claim 1, 2, 3 or 4. 

25 6. An expression vector containing the nucleotide 

sequence of Claim 1, 2, 3 or 4 in operative association with 
a nucleotide regulatory element tjiat controls expression of 
the nucleotide sequence in a host cell. 

30 7 , An antisense molecule containing the nucleotide 

sequence of Claim 4 . 

8 , A ribozyme molecule containing the nucleotide 
sequence of Claim 4 . 

35 

9. A triple helix molecule containing the nucleotide 
sequence of Claim 4. 
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10. The nucleotide vector of Claim 5 which is a plasmid 
vector . 

11. The nucleotide vector of Claim 5 which is a viral 
5 vector. 

12 . A genetically engineered host cell containing the 

nucleotide sequence of Claim 1, 2, 3 or 4 . 

10 13 . A genetically engineered host cell containing the 

nucleotide sequence of Claim 1, 2, 3 or 4 in operative 
association with a regulatory element that controls 
expression of the nucleotide sequence in the host cell. 

15 14 . A substantially pure PKD1 gene product. 

15. The substantially pure PKD1 gene product of Claim 14 
wherein the gene product contains the amino acid sequence 
(SEQ ID NO: 2) depicted in FIG. 6. 

20 

15. An antibody that immunospecif ically binds to a PKD1 
gene product . 

16 . A method for diagnosing autosomal dominant 

25 polycystic kidney disease, comprising detecting a mutant PKD1 
gene or gene product in a patient sample. 

17. A method for treating autosomal dominant polycystic 
kidney disease, comprising administering an effective amount 

3 0 of a compound to a patient in need of such treatment, which 
compound inhibits the synthesis, expression or activity of a 
mutant PKD1 gene product. 

18 . The method of Claim 17 in which the compound is an 
35 antisense or ribozyme molecule that' blocks translation of 

mutant PKD1 mRNA. 
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19. The method of Claim 18 in which the compound is a 
nucleotide that is complementary to the 5 ' region of the PKD1 
gene, and blocks transcription of the PKD1 gene via triple 
helix formation . 



20. The method of Claim 19 further comprising replacing 
the mutant PKD1 gene with a normal allele, or replacing the 
mutant PKD1 gene product with a normal PKD1 gene product. 



antibody that immunospecif ically binds and inactivates the 
mutant PKD1 gene product . 

22. A method for treating autosomal dominant polycystic 
15 kidney disease, comprising administering a normal allele of 

the PKD1 gene to a patient in need of such treatment, so that 
the normal PKD1 allele is expressed in the patient. 

23 . A method for treating autosomal dominant polycystic 
20 kidney disease, comprising administering an effective amount 

of a normal PKD1 gene product to a patient in need of such 
therapy. 

, 24 . A method of measuring the presence of a PKD1 gene 
25 product in a sample, comprising; 



5 



10 



21. 



The method of Claim 19 in which the compound is an 



(a) 



contacting the sample suspected of containing 
a PKDl gene product with an antibody that 
binds to the PKDl gene product under 
conditions which allow for the formation of 
reaction complexes comprising the antibody and 
the PKDl gene product / 

detecting the formation of reaction complexes 
comprising the antibody and PKDl gene product 
in the sample, in which detection of the 
formation of reaction complexes indicates the 
presence of the PKDl gene product in the 
sample. 



30 



(b) 



35 
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25. The method of Claim 24 in which the antibody is 
bound to a solid phase support. 

26. The method of Claim 24 in which the PKD1 gene 
5 product is bound to a solid phase support, 

27. The method of Claim 25 or 26 which additionally 
comprises contacting the sample with a labeled PKD1 gene 
product in step (a) , and removing unbound substances prior to 

10 step (b) , in which a decrease in the amount of reaction 

complexes comprising the antibody and the labelled PKD1 gene 
product indicates the presence of the PKD1 gene product in 
the sample. 

15 2 8. A method of evaluating the level of PKD1 gene 

product in a biological sample comprising; 

(a) detecting the formation of reaction complexes 
in a biological sample according to the method 
of Claim 24; and 
20 (b) evaluating the amount of reaction complexes 

formed, which amount of reaction complexes 
corresponds to the level* of PKD1 gene product 
in the biological sample . 

A method of detecting or diagnosing the presence of 
associated with elevated or decreased levels of 
product in a mammalian subject comprising: 

(a) evaluating the level of PKDl gene product in a 
biological sample from mammalian subject 
according to Claim 28; and 

(b) comparing the level detected in step (a) to a 
level of PKDl gene product present in normal 
subjects or in the subject at an earlier time, 
in which an increase or a decrease in the 
level of the PKDl gene product as compared to 
normal levels indicates a disease condition. 



25 29. 

a disease 
PKDl gene 

30 
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30. A method for monitoring a therapeutic treatment of 
a disease associated with elevated or decreased levels of 
PKD1 gene product in a mammalian subject, comprising 
evaluating the levels of the PKD1 gene product in a series of 
5 biological samples obtained at different time points from a 
mammalian subject undergoing a therapeutic treatment for a 
disease associated with elevated or decreased levels of PKDl 
gene product, according to the method of Claim 28. 

10 31. The method according to Claim 29 or 30 wherein the 

disease associated with decreased levels of PKDl gene product 
is selected from the group consisting of polycystic kidney 
disease, and acquired cystic disease, 

15 32. A test kit for measuring the presence of or amount 

of PKDl gene product in a sample, comprising 

(a) an antibody that immunospecif ically binds to a 
PKDl gene product; 

(b) means for detecting binding of the anti-PKDl 
20 gene product antibody to PKDl gene product in 

a sample; 

(c) other reagents; and 

(d) directions for use of the kit. 

25 33. A pharmaceutical composition for treating 

polycystic kidney disease in a mammal, comprising the PKDl 
gene product of Claim 14 and a pharmaceutically acceptable 
carrier. 

30 34. A method for treating polycystic kidney disease in 

a mammal comprising administering an amount of a 
pharmaceutical composition of Claim 33 effective to 
ameliorate the symptoms of polycystic kidney disease. 



35 35. A method for treating polycystic kidney disease in 

a mammal comprising increasing the expression of a protein 
encoded by the nucleic acid of Claim 1,2, 3 or 4 . 
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1 ATG CCG CCC GCC GCG CCC GCC CGC CTG GCG CTG GCC CTG GGC CTG GGC CTG TGG CTC GGG 60 

1 -MPPAAPARLAI-ALG.I/GLWLG 2 Q 

61 GCG CTG GCG GGG GGG CCC GGG CGC GGC TGC GGG CCC TGC GAG CCC CCC TGC CTC TGC GGG 120 

21 ALAGGPGRG CGPCB PP CLCG 40 

121 CCA GCG CCC GGC GCC GCC TGC CGC GTC AAC TGC TCG GGC CGC GGG CTG CGG ACG CTC GGT 180 

41 PAPGAACRVNCSGR GLRTLG 60 

181 CCC GCG CTG CGC ATC CCC GCG GAC GCC ACA GAG CTA GAC GTC TCC CAC AAC CTG CTC CGG 240 

61 PALRIP ADATELDVSHNLLR 80 

241 GCG CTG GAC GTT GGG CTC CTG GCG AAC CTC TCG GCG CTG GCA GAG CTG GAT ATA AGC AAC 300 

81 ALDVGL LANLSALAELDISN 100 

301 AAC AAG ATT TCT ACG TTA GAA GAA GGA ATA TTT GCT AAT TTA TTT AAT TTA AGT GAA ATA 360 

101 NKISTLEEGIFANLFNLSEI 120 

361 AAC CTG AGT GGG AAC CCG TTT GAG TGT GAC TGT GGC CTG GCG TGG CTG CCG CAA TGG GCG 420 

121 NLSGN-PFECDCG LAWLPQWA. 140 

421 GAG GAG CAG CAG GTG CGG GTG GTG CAG CCC GAG GCA GCC ACG TGT GCT GGG CCT GGC TCC 4 80 

141 E E Q QVRVVQP EAATCAGPG S 160 

4 81 CTG GCT GGC CAG CCT CTG CTT GGC ATC CCC TTG CTG GAC AGT GGC TGT GGT GAG GAG TAT 540 

161 LAG QPLLGIPLLDSGCGEEY 180 

541 GTC GCC TGC CTC CCT GAC AAC AGC TCA GGC ACC GTG GCA GCA GTG TCC TTT TCA GCT GCC 600 

181 VAC L P D N S S G T V A A V S . F . S A A 200 

601 CAC GAA GGC CTG CTT CAG CCA GAG GCC TGC AGC GCC TTC TGC TTC TCC ACC GGC CAG GGC 660 

201 K E G L h Q P.E AC SA F CF S T GQ'G 220 

661 CTC GCA GCC CTC TCG GAG CAG GGC TGG TGC CTG TGT GGG GCG GCC CAG CCC TCC AGT GCC 720 

221 L A A L SEQGWCLCGAAQ PSSA 240 

721 TCC TTT GCC TGC CTG TCC CTC TGC TCC GGG CCC CCG GCA CCT CCT GCC CCC ACC TGT AGG 780 

241 S FA CIjSLCSGPPAPPAPTCR 260 

781 GGC CCC ACC CTC CTC CAG CAC GTC- TTC CCT GCC TCC CCA GGG GCC ACC CTG GTG GGG CCC 840 

261 G PT LLQHVFPASPGATLVGP 280 

841 CAC GGA CCT CTG GCC TCT GGC CAG CTA GCA GCC TTC CAC ATC GCT GCC CCG CTC CCT GTC 900 

281 HGP L A S G Q LAAF'H IA A P L P V 300 

901 ACT GAC ACA CGC TGG GAC TTC GGA GAC GGC TCC GCC GAG GTG GAT GCC GCT GGG CCG GCT 9 60 

301 TDTRWDFGPGSAEVDAAGPA 320 

•961 GCC TCG CAT CGC TAT GTG CTG CCT GGG CGC TAT CAC GTG ACG GCC GTG CTG GCC CTG GGG 102 0 

321 A S H JR.YVLPGRYHVTAV LALG 340 

1021 GCC GGC TCA GCC CTG CTG GGG ACA GAC GTG CAG GTG GAA GCG GCA CCT GCC GCC CTG GAG 1080 

341 AGSALLGTDVQVEAAPAALE 360 

1081 CTC GTG TGC CCG TCC TCG GTG CAG AGT GAC GAG AGC CTC GAC CTC AGC ATC CAG AAC CGC 114 0 

361 LVC PSSVQSDESLDLS I Q N R 380 

1141 " GGT GGT TCA GGC CTG GAG* GCC GCC TAC AGC ATC GTG GCC CTG GGC GAG GAG CCG GCC CGA 1200 

381 GGSGLEAAYSIVALGEEPAR 400 

1201 GCG GTG CAC CCG CTC TGC CCC TCG GAC ACG GAG ATC TTC CCT GGC AAC GGG CAC TGC TAC 1260 

401 A V H PLCPSDTEIFPGNGHCY 420 
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1261 CGC CTG GTG GTG GAG AAG GCG GCC TGG CTG GAG GCG CAG GAG CAG TGT CAG GCC TGG GCC 132 0 

421 RL VVEKAAWLQAQEO C Q . A WA 440 

1321 GGG GCC GCC CTG GCA ATG GTG GAC AGT CCC GCC GTG CAG CGC TTC CTG GTC TCC CGG GTC 1380 

441 GAALAMVDSPAVOREkVS RV 460 

1381 ACC AGG AGC CTA GAC GTG TGG ATC GGC TTC TCG ACT GTG CAG GGG GTG GAG GTG GGC CCA 144 0 

461 TRSLDVWIGFSTVQGVEVGP 480 

1441 GCG CCG CAG GGC GAG GCC TTC AGC CTG GAG AGC TGC CAG AAC TGG CTG CCC GGG GAG CCA 1500 

481 APQGEAFSL. ESCQN L P G E P 500 

1501 CAC CCA GCC ACA GCC GAG CAC TGC GTC CGG CTC GGG CCC ACC GGG TGG TGT AAC ACC GAC 1560 

501 HPATAEHCVRLGFTGW CN TD 520 

1561 CTG TGC TCA GCG CCG CAC AGC TAC GTC TGC GAG CTG CAG CCC GGA GGC CCA GTG CAG GAT 1620 

521 LCSAPHSYVCELQFGGPVQD 540 

1621 GCC GAG AAC CTC CTC GTG GGA GCG CCC AGT GGG GAC CTG CAG GGA CCC CTG ACG CCT CTG 1680 

541 AENLLVGAP SGDLQGP h T PL 560 

1681 GCA CAG CAG GAC GGC CTC TCA GCC CCG CAC GAG CCC GTG GAG GTC ATG GTA TTC CCG GGC 1740 

561 A.QO DGLSAPHEPVEVMVF PG 580 

1741 CTG CGT CTG AGC CGT GAA GCC TTC CTC ACC ACG GCC GAA TTT GGG ACC CAG GAG CTC CGG 1800 

581 LR'L S REAFLTTAE FGT QE LR 600 

1801 CGG CCC GCC CAG CTG CGG CTG CAG GTG TAC CGG CTC CTC AGC ACA GCA GGG ACC CCG GAG 1860 

601 R P A QLRLQVYRLLSTAG TPE 620 

1861 AAC GGC AGC GAG CCT GAG AGC AGG TCC CCG GAC AAC AGG ACC CAG CTG - CCC CCC GCG TGC 1920 

621 NGSEPESRSPDNRT'QLAPA'c 640 

1921 ATG CCA GGG GGA CGC TGG TGC CCT GGA GCC AAC ATC TGC TTG CCG CTG GAC GCC TCC TGC 1980 

641 MPG GRWCPGANICLPL D ASC 660 

1981 CAC CCC CAG GCC TGC GCC AAT GGC TGC ACG TCA GGG CCA GGG CTA CCC GGG GCC CCC TAT 204 0 

661 HPQACAKGCTSGPGLP G A P Y 680 

2041 GCG CTA TGG AGA GAG TTC CTC TTC TCC GTT CCC GCG GGG CCC CCC GCG CAG TAC TCG GTC 2100 

681 A L W R EFLFSV P AG PPA Q Y S V 700 

2101 ACC CTC CAC GGC CAG GAT GTC CTC ATG CTC CCT GGT GAC CTC GTT GGC TTG CAG CAC GAC 2160 

701 TLH G QDVLML PGDLVG L Q H D 720 

2161 GCT GGC CCT GGC GCC CTC CTG CAC TGC TCG CCG GCT CCC GGC CAC CCT GGT CCC CGG GCC 2220 

721 AGP GALLHCS PAP G HP GPRA 740 

2221 CCG TAC CTC TCC GCC AAC GCC TCG TCA TGG CTG CCC CAC TTG CCA GCC CAG CTG GAG GGC 2280 

741 PYLSANASSWLPHLPAQLEG 760 

2281 ACT TGG GGC TGC CCT GCC TGT GCC CTG CGG CTG CTT GCA CAA CGG GAA CAG CTC ACC GTG 2340 

761 TWG C PACALRLLA / QRE QLTV 780 

2341 CTG CTG GGC TTG AGG CCC AAC CCT GGA CTG CGG CTG CCT GGG CGC TAT GAG GTC CGG GCA 2400 

781 h h G LRPNPGLRLPGRY EVRA 800 

2401 GAG GTG GGC AAT GGC GTG TCC AGG CAC AAC CTC TCC TGC AGC TTT GAC GTG GTC TCC CCA 2460 

801 EVGTiGVSRHNLSCSFDVVS P 820 

2461 GTG GCT GGG CTG CGG GTC ATC TAC CCT GCC CCC CGC GAC GGC CGC CTC TAC GTG CCC ACC 2 520 

821 V A G LRVIYPAPRDGRLYVPT 840 
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2521 AAC GGC TCA GCC TTG GTG CTC CAG GTG GAC TCT GGT GCC AAC GCC ACG Gdc ACG GCT CGC 2 580 

841 NGSALVLQVDSGANATATAR 860 

2581 TGG CCT GGG GGC AGT CTC AGC GCC CGC TTT GAG AAT GTC TGC CCT GCC CTG GTG GCC ACC 2640 

861 WPGGSLSARFENVCPALVAT 880 

2641 TTC GTG CCC GCC TGC CCC TGG GAG ACC AAC GAT ACC CTG TTC TCA GTG GTA GCA CTG CCG 2700 

8.81 F VPACPWETNDTLFSVVALP $00 

2701 TGG CTC AGT GAG GGG GAG CAC GTG GTG GAC GTG GTG GTG GAA AAC AGC GCC AGC CGG GCC 2760 

901 W L S E G E HVVDVVV E "*N - S AS RA 920 

2761 AAC CTC AGC CTG CGG GTG ACG GCG GAG GAG CCC ATC TGT GGC CTC CGC GCC ACG CCC AGC 2820 

$21 NLSLRVTAEEPICGLRATPS 940 

2821 CCC GAG GCC CGT GTA CTG CAG GGA GTC CTA GTG AGG TAC AGC CCC GTG GTG GAG GCC GGC 2880 

941 PEARVLQGVLVRYSPVVEAG 950 

2881 TCG GAC ATG GTC TTC CGG TGG ACC ATC AAC GAC AAG CAG TCC CTG ACC TTC CAG AAC GTG . 2940 

961 S DMVFRWT1NDKQ SLT FQNV 980 

2941 GTC TTC AAT GTC ATT TAT CAG AGC GCG GCG GTC TTC AAG CTC TCA CTG ACG GCC TCC AAC 3 000 

981 V FN V I Y Q SAAVFKLSL TASN 1000 

3001 CAC GTG AGC AAC GTC ACC GTG AAC TAC AAC GTA ACC GTG GAG CGG ATG AAC AGG ATG CAG 3 060 

1001 HV's N VTVNYNVTV ERMNRMQ 1020 

3061 GGT CTG CAG GTC TCC ACA GTG CCG GCC GTG CTG TCC CCC AAT GCC ACG CTA GCA CTG ACG 3120 

1021 G LQ VSTVPAVLS P NAT L A L T 1040 

3121 GCG GGC GTG CTG GTG GAC TCG GCC GTG GAG GTG GCC TTC CTG TGG ACC TTT GGG GAT GGG 3180 

1041 AGVLVDSAVEVAFLWT FGD G 1060 

3181 GAG CAG GCC CTC CAC CAG TTC CAG CCT CCG TAC AAC GAG TCC TTC CCA GTT CCA GAC CCC 3240 

1061 EQAliKQFQPPYNE SFPVPDP 1080 

3241 TCG GTG GCC CAG GTG CTG GTG GAG CAC AAT GTC ACG CAC ACC TAC GCT GCC CCA GGT GAG * 3300 

1081 S VA QVLVEHNVTHTYAAPGE 1100 

3301 TAC CTC CTG ACC GTG CTG GCA TCT AAT GCC TTC GAG AAC CTG ACG CAG CAG GTG CCT GTG 33 60 

1101 YLLTVLASKAF ENLTQQVPV . . 1120 

3 361 AGC GTG CGC GCC TCC CTG CCC TCC GTG GCT GTG GGT GTG AGT GAC GGC GTC CTG GTG GCC 3420 

1121 SVRASLPSVAVGVSDGVLVA 1140 

3421 GGC CGG CCC GTC ACC TTC TAC CCG CAC CCG CTG CCC TCG, CCT GGG GGT GTT CTT TAC ACG 3480 

1141 G R P VTFY PHPLP S'PGGVLYT 1160 

34 81 TGG GAC TTC GGG GAC GGC TCC CCT GTC CTG ACC CAG AGC CAG CCG GCT GCC AAC CAC ACC 3540 

1161 W D F G DG S PVLTQ S Q PAANHT 1180 

3541 TAT GCC TCG AGG GGC ACC TAC CAC GTG CGC CTG GAG GTC AAC AAC ACG GTG AGC GGT GCG 3600 

1181 Y A S RGTYHVRLEV'NNTVSGA 1200 

3601 GCG GCC CAG GCG GAT GTG CGC GTC TTT GAG GAG CTC CGC GGA CTC AGC GTG GAC ATG AGC 3660 

1201 AA QADVRVFEELRGLSVDMS 1220 

3661 CTG GCC GTG GAG CAG GGC GCC CCC GTG GTG GTC AGC GCC GCG GTG CAG ACG GGC GAC AAC 372 0 

1221 - LAVEQG APVVVSAAVQTGDN 1240 

3721 ATC ACG TGG ACC TTC GAC ATG GGG GAC GGC ACC GTG CTG TCG GGC CCG GAG GCA ACA GTG 3780 

1241 1 TW T-FDMGDGTVL S GP E ATV 1260 
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3781 GAG CAT GTG TAC CTG CGG GCA CAG AAC TGC ACA GTG ACC GTG GGT GCG GGC AGC CCC GCC 3840 

1261 EHVYLRAQNCTVTVGAGS PA 1280 

3841 GGC CAC CTG GCC CGG AGC CTG CAC GTG CTG GTC TTC GTC CTG GAG GTG CTG CGC GTT GAA 3900 

1281 G H LARSliHVLVFVL. EVLRVE 1300 

3901 CCC GCC GCC TGC ATC CCC ACG CAG CCT GAC GCG CGG CTC ACG GCC TAC GTC ACC GGG AAC 3960 

1301 PAACIPTQPDARLTAYVTGN 1320 

3961 CCG GCC CAC TAC CTC TTC GAC TGG ACC TTC GGG GAT GGC TCC TCC AAC ACG ACC GTG CGG 402 0 

1321 PAHYLFDWT FGDGSSNTTV'R 1340 

4021 GGG TGC CCG ACG GTG ACA CAC AAC TTC ACG CGG AGC GGC ACG TTC CCC CTG GCG CTG GTG 4080 

1341 GC PTVTHNFTRSGTFPLALV 1360 

4081 CTG TCC AGC CGC GTG AAC AGG GCG CAT TAC TTC ACC AGC, ATC TGC GTG GAG CCA GAG GTG 4140 

1361 h S S RVNRAHYFTSICV EPEV 1380 

4141 GGC AAC GTC ACC CTG CAG CCA GAG AGG CAG TTT GTG CAG CTC GGG GAC GAG GCC TGG CTG 4200 

1381 GtfV TLQPERQFVQLG DEAWL 1400 

4201 GTG GCA TGT GCC TGG CCC CCG TTC CCC TAC CGC TAC ACC TGG GAC TTT GGC ACC GAG GAA 4260 

1401 VACAWPPFPYRYTWDFGTEE 1420 

4261 GCC GCC CCC ACC CGT GCC AGG GGC CCT GAG GTG ACG TTC ATC TAC CGA GAC CCA GGC TCC 4320 

1421- A A PTRARGP EVTFIYR DPG S 1440 

4321 TAT CTT GTG ACA GTC ACC GCG TCC AAC AAC ATC TCT GCT GCC AAT GAC TCA GCC CTG GTG 4380 

1441 Y L V TVTASNNI SAAND SA LV 1460 

4381 GAG GTG CAG GAG CCC GTG CTG GTC ACC AGC ATC AAG GTC AAT GGC TCC CTT GGG CTG GAG 444 0 

1461 EVQ EPVLVTSIKVNG S LGLE 1480 

4441 CTG CAG CAG CCG TAC CTG TTC TCT GCT GTG GGC CGT GGG CGC CCC GCC AGC TAC CTG TGG 4500 

1481 LQQ PYLFSAVGRGRPA SYLW 1500 

4501 GAT CTG GGG GAC GGT GGG TGG CTC GAG GGT CCG GAG GTC ACC CAC GCT TAC AAC AGC ACA 4560 

1501 DLG DGGWLEGP.-EVTHAYNST 1520 

4561 GGT GAC TTC ACC GTT AGG GTG GCC GGC TGG AAT GAG GTG AGC CGC AGC GAG GCC TGG CTC 4620 

1521 GDFTVRVAGWNEVSRS EAWL 1540 

4621 AAT GTG ACG GTG AAG CGG CGC GTG CGG GGG CTC GTC GTC AAT GCA AGC CGC ACG GTG GTG 4680 

1541 NVT VKRRVRGLVVNAS RTVV 1560 

4681 CCC CTG AAT GGG AGC GTG AGC TTC AGC ACG TCG CTG GAG GCC GGC AGT GAT GTG CGC TAT 4740 

1561 P L N G S VSF S TSLEAGS DVRY 1580 

47 41 TCC TGG GTG CTC TGT GAC CGC TGC ACG CCC ATC CCT GGG GGT CCT ACC ATC TCT TAC ACC 4800 

1581 SWV LCDRCTPIPGGPT ISYT 1600 

4801 TTC CGC TCC GTG GGC ACC TTC AAT ATC ATC GTC ACG GCT GAG AAC GAG GTG GGC TCC GCC 4860 

1601 FR S VGTFNI IVTAENEVG SA 1620 

4861 CAG GAC AGC ATC TTC GTC TAT GTC CTG CAG CTC ATA GAG GGG CTG CAG GTG GTG GGC GGT 4920 

1621 QDS I FVYVLQLI EGLQVVGG 1640 

4921 GGC CGC TAC TTC CCC ACC AAC CAC ACG GTA CAG CTG CAG GCC GTG GTT AGG GAT GGC ACC 4980 

1641 GRY FPTN;HTVQLQAVVRDGT 1660 

4981 AAC GTC TCC TAC AGC TGG ACT GCC TGG AGG GAC AGG GGC CCG GCC CTG GCC GGC AGC GGC 5040 

1661 NVS YSWTAWRDRGPALAGSG 1680 
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S041 AAA GGC TTC TCG CTC ACC GTG CTC GAG GCC GGC ACC TAC CAT GTG CAG CTG CGG GCC ACC 

1681 KGFSLTVLEAGTYHVQLRAT 

5101 AAC ATG CTG GGC AGC GCC TGG GCC GAC TGC ACC ATG GAC TTC GTG GAG CCT GTG GGG TGG 

1701 NM LGSAWADCTMDFVEPVGW 

5161 CTG ATG GTG GCC GCC TCC CCG AAC CCA GCT GCC GTC AAC ACA AGC GTC ACC CTC AGT GCC 

1721 LM VAASPNPAAVNTSVTLSA 

5221 GAG CTG GCT GGT GGC AGT GGT GTC GTA TAC ACT TGG TCC TTC GAG GAG GGG CTG AGC TGG 

1741 E LAGGSGVVYTW S LE E GL SW 

5281 GAG ACC TCC GAG CCA TTT ACC ACC CAT AGC TTC CCC ACA CCC GGC CTG CAC TTG GTC ACC 

1761 ETS EPFTTHSFPTPGLHLVT' 

5341 ATG ACG GCA GGG AAC CCG CTG GGC TCA GCC AAC GCC ACC GTG GAA GTG GAT GTG CAG GTG 

1781 MTAGNPLGSANATVE.VDVQV 

5401 CCT GTG AGT GGC CTC AGC ATC AGG GCC AGC GAG CCC GGA GGC AGC TTC GTG GCG GCC GGG 

1801 P V S GLSIRASEPGGSFVAAG 

5461 TCC TCT GTG CCC TTT TGG GGG CAG CTG GCC ACG GGC ACC AAT GTG AGC TGG TGC TGG GCT 

182 1 SSVPFWGQLATGTNVSWCWA 

5521 GTG CCC GGC GGC AGC AGC AAG CGT GGC CCT CAT GTC ACC ATG GTC TTC CCG GAT GCT GGC 

1841 V P G G S $ KRGPHVTMVFPDAG 

5581 ACC TTC TCC ATC CGG CTC AAT GCC TCC AAC GCA GTC AGC TGG GTC TCA GCC ACG TAC AAC 

1861 TFS IRLNASNAVSWVSATYN 

5641 CTC ACG GCG GAG GAG CCC ATC GTG GGC CTG GTG CTG TGG GCC AGC AGC AAG GTG GTG GCG 

1881 L T A E E P IVGLVLWAS S KVVA 

5701 CCC GGG CAG CTG GTC CAT TTT CAG ATC CTG CTG GCT GCC- GGC TCA GCT GTC ACC TTC CGC 

1901 PGQLVHFQIIiliAAG-SAVTFR 

5761 CTA CAG GTC GGC GGG GCC AAC CCC GAG GTG CTC CCC GGG CCC CGT TTC TCC CAC AGC TTC 

1921 LQVGGAN PEVLPGPRFSHSF 

5821 CCC CGC GTC GGA GAC CAC GTG GTG AGC GTG CGG GGC AAA AAC CAC GTG AGC TGG GCC CAG 

1941 P R V GDHVVS VRGKNHVSWAQ 

5881 GCG CAG GTG CGC ATC GTG GTG CTG GAG GCC GTG AGT GGG CTG CAG GTG CCC AAC TGC TGC 

1961 A Q V R I V V L EAVS G h • Q V PN C C 

5941 GAG CCT GGC ATC GCC ACG GGC ACT GAG AGG AAC TTC ACA GCC CGC GTG CAG CGC GGC TCT 

1981 E P „G IATGTERNFTARVQRGS 

6001 CGG GTC GCC TAC GCC TGG TAC TTC TCG CTG CAG AAG "GTC CAG GGC GAC TCG CTG GTC ATC 

2001 RVA YAWYFSLQKVQGDSLVI 

6061 CTG TCG GGC CGC GAC GTC ACC TAC ACG CCC GTG GCC GCG GGG CTG TTG GAG ATC CAG GTG 

2021 LSGRDVTYTPVAAGLLEIQV 

6121 CGC GCC TTC AAC GCC CTG GGC AGT GAG AAC CGC ACG CTG GTG CTG GAG GTT CAG GAC GCC 

2041 RAFNALGSENRTLVLEVQDA 

6181 GTC CAG TAT GTG GCC CTG CAG AGC GGC CCC TGC TTC ACC AAC CGC TCG GCG CAG TTT GAG 

2061 VQYVALQSGPCFTNRSAQFE 

6241 GCC GCC ACC AGC CCC AGC CCC CGG CGT GTG GCC TAC CAC TGG GAC TTT GGG GAT GGG TCG 

2081 AAT SPSPRRVAYHWDFG DGS 
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6301 CCA GGG CAG GAC ACA GAT GAG CCC AGG GCC GAG CAC TCC TAC CTG AGG CCT GGG GAC TAC 6360 

2101 PGQDTDEpRAEHSY LR PGDY 2120 

63 61 CGC GTG CAG GTG AAC GCC TCC AAC CTG GTG AGC TTC TTC GTG GCG CAG GCC ACG GTG ACC 6420 
2121 RVQVNASNLVSFFVAQATVT 2140 

6421 GTC CAG GTG CTG GCC TGC CGG GAG CCG GAG GTG GAC GTG GTC CTG CCC CTG CAG GTG CTG 6480 

2141 VQVLACREPEVDVVLP I Q V V 2160 

64 81 ATG CGG CGA TCA CAG CGC AAC TAC TTG GAG GCC CAC GTT GAC CTG CGC GAC TGC GTC ACC 6540 
2161 KRRSQRNYLEAHVDLRDCVT 2180 

6541 TAC CAG ACT GAG TAC CGC TGG GAG GTG TAT CGC ACC GCC AGC TGC CAG CGG CCG GGG CGC 6600 

2181 YQTEYRWEVYRTAS CQRPGR 2200 

6601 CCA GCG CGT GTG GCC CTG CCC GGC GTG GAC GTG AGC CGG CCT CGG CTG GTG CTG CCG CGG 6660 

2201 PA RVALPGVDVSRPRL VLPR 2220 

6661 CTG GCG CTG CCT GTG GGG CAC TAC TGC TTT GTG TTT GTC GTG TCA TTT GGG GAC ACG CCA 6720 

2221 LA.L PVGHYCFVFVVS F GDTP 2240 

6721 CTG ACA CAG AGC ATC CAG GCC AAT GTG ACG GTG GCC CCC GAG CGC CTG GTG CCC ATC ATT 67 80 

2241 LT Q S I QANVTVAPERL VP I I 2260 

67 81 GAG GOT GGC TCA TAC CGC GTG TGG TCA GAC ACA CGG GAC CTG GTG CTG GAT GGG AGC GAG 684 0 

•2261 EGG S Y RVW S D T R D LV L.D.G S E 2280 

6841 TCC TAC GAC CCC AAC CTG GAG GAC GGC GAC CAG ACG CCG CTC AGT TTC CAC TGG GCC TGT 6900 

2281 SY*D PNLEDGDQTPLS F HWAC 2300 

6901 GTG GCT TCG ACA CAG AGG GAG GCT GGC GGG TGT GCG CTG AAC TTT GGG CCC CGC GGG AGC 6960 

23 01 VA S T Q REAGG CA.LNFG PRGS 2320 

6961 AGC ACG GTC ACC ATT CCA CGG GAG CGG CTG GCG GCT GGC GTG GAG TAC ACC TTC AGC CTG 702 0 

2321 S T V T I PRERLA AGVEY T FSL 2340 

7021 ACC GTG TGG AAG GCC GGC CGC AAG GAG GAG GCC ACC AAC CAG ACG GTG CTG ATC CGG AGT 7080 

2341 TVWKAGRKEEATNQTVLIRS 23 60 

7081 GGC CGG GTG CCC ATT GTG TCC TTG GAG TGT GTG TCC TGC AAG GCA CAG GCC GTG TAC GAA 7140 

2361 G R V P I VSLECVSC .KAQ AVYE 2380 

7141 GTG AGC CGC AGC TCC TAC GTG TAC TTG GAG GGC CGC TGC CTC AAT TGC AGC AGC GGC TCC 7200 

2381 VSRSSYVYLEGRCLNCSSGS 2400 

7201 AAG CGA GGG CGG TGG GCT GCA CGT ACG TTC AGC AAC AAG ACG CTG GTG CTG GAT GAG ACC 7260 

2401 KR G RWAARTFSNKTLV L DET 2420 

/ 

7261 ACC ACA TCC ACG GGC AGT GCA GGC ATG CGA CTG GTG CTG CGG CGG GGC GTG CTG CGG GAC 7320 

2421 TTS TGSAGMRLVLRRG VLRD 2440 

7321 GGC GAG GGA TAC ACC TTC ACG CTC ACG GTG CTG GGC CGC TCT GGC GAG GAG GAG GGC TGC 73 80 

2441 GEGYTFTLTVLGRSGEEEGC 2460 

7381 GCC TCC ATC CGC CTG TCC CCC AAC CGC CCG CCG CTG GGG GGC TCT TGC CGC CTC TTC CCA 7440 

2461 AS I RLSPNRPPLGGSCRLFP 2480 

7441 CTG GGC GCT GTG CAC GCC CTC ACC ACC AAG GTG CAC TTC GAA TGC ACG GGC TGG CAT GAC 7500 

2481 LGAVHALT TKVHFECTGWHD 2500 

7501 GCG GAG GAT GCT GGC GCC CCG CTG GTG TAC GCC CTG CTG CTG CGG CGC TGT CGC CAG GGC 7560 

2501 AED AGAPLVYALLpRRCRQG 2520 
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7561 CAC TGC GAG GAG TTC TGT GTC TAC AAG GGC AGC CTC TCC AGC TAC GGA GCC GTG CTG CCC 7520 

2521 HCEEFCVYKGSLSSYGAVLP 2540 

7621 CCG GGT TTC AGG CCA CAC TTC GAG GTG GGC CTG GCC GTG GTG GTG CAG GAC CAG CTG GGA 7680 

2541 PG F RPHFEVGLA VVVQDQLG 2560 

7681 GCC GCT GTG GTC GCC CTC AAC AGG TCT TTG GCC ATC ACC CTC CCA GAG CCC AAC GGC AGC 7740 

2561 AAVVALNRSLAITLPEPKGS 2580 

7741 GCA ACG GGG CTC ACA GTC TGG CTG CAC GGG CTC ACC GCT AGT GTG CTC CCA GGG CTG CTG 7800 

2581 ATG LTVWLHGLTA SVLPGLL 2600 

7801 CGG CAG GCC GAT CCC CAG CAC GTC ATC GAG TAC TCG TTG GCC CTG GTC ACC GTG CTG AAC 7860 

2601 R Q A DPQHVIEYSL ALVT VLN ; 2620 

7861 GAG TAC GAG CGG GCC CTG GAC GTG GCG GCA GAG CCC AAG CAC GAG CGG CAG CAC CGA GCC 7920 

2621 EYE RALDVAAEPKHE R QHR A 2640 

7921 CAG ATA CGC AAG AAC ATC ACG GAG ACT CTG GTG TCC CTG AGG GTC CAC ACT GTG GAT GAC 7980 

2641 QIR KNITETLVSLRV.HTVDD 2660 

7981 ATC CAG CAG ATC GCT GCT GCG CTG GCC CAG TGC ATG GGG CCC AGC AGG GAG CTC GTA TGC 804 0 

2661 I Q Q IAAALAQCMG P S R ELVC 2680 

8041 CGC TCG TGC CTG AAG CAG ACG CTG CAC AAG CTG GAG GCC ATG ATG CTC ATC CTG CAG GCA 8100 

2681 RSC LKQTLHKLEAMKL I L Q A 2700 

8101 GAG ACC ACC GCG GGC ACC GTG ACG CCC ACC GCC ATC GGA GAC AGC ATC CTC AAC ATC ACA 8160 

2701 ETT AGTVTPTAIGDSI LNIT 2720 

8161 GGA GAC CTC ATC CAC CTG GCC AGC TCG GAC GTG CGG GCA CCA CAG CCC TCA GAG CTG GGA 8220 

2721 G DIj I H LASSOVRAPQP SELG 2740 

8221 GCC GAG TCA CCA TCT CGG ATG GTG GCG TCC CAG GCC TAC AAC CTG ACC TCT GCC CTC ATG 8280 

2741 AESPSRMVASQAYNLTSALM " 2760 

8281 ■ CGC ATC CTC ATG CGC TCC CGC GTG CTC AAC GAG GAG CCC CTG ACG CTG GCG GGC GAG GAG 834 0 

2761 RIL MRSRVLNEE PLTLAGEE 2780 

8341 ATC GTG GCC CAG GGC AAG CGC TCG GAC CCG CGG AGC CTG CTG TGC TAT GGC GGC GCC CCA 8400 

2781 I VA QGKRSDPRSLLCYGGAP 2800 

8401 GGG CCT GGC TGC CAC TTC TCC ATC CCC GAG GCT TTC AGC GGG GCC CTG GCC AAC CTC AGT 8460 

2801 G PG CHFSIPEAFSG-ALANL'S 2820 

8461 GAC GTG GTG CAG CTC ATC TTT CTG GTG GAC TCC AAT CCC TTT CCC TTT GGC TAT ATC AGC 8520 

2821 DVVQLIFLVDSNPFPFGYIS 2840 

8521 AAC TAC ACC GTC TCC ACC AAG GTG GCC TCG ATG GCA TTC CAG ACA CAG GCC GGC GCC CAG 8580 

2841 NYTVSTKVASMAFQTQAGAQ 2860 

8581 ATC CCC ATC GAG CGG CTG GCC TCA GAG CGC GCC ATC ACC GTG AAG GTG CCC AAC AAC TCG 8640 

2861 I PI ERLASEPAITVKVPNNS 2880 

8 641- GAC TGG GCT GCC CGG GGC CAC CGC AGC TCC GCC AAC TCC GCC AAC TCC GTT GTG GTC CAG 8700 

2881 D W A ARGHRSSANSANSVVVQ 2900 

8701 CCC CAG GCC TCC GTC GGT GCT GTG GTC ACC CTG GAC AGC AGC AAC CCT GCG GCC GGG CTG 8760 

2901 PQA SVGAVVTLDSSNPAAGL 2920 

8761 CAT CTG CAG CTC AAC TAT ACG CTG CTG GAC GGC CAC TAC CTG TCT GAG GAA CCT GAG CCC 8820 

2921 H h Q LNYTLLDGHYLSEEPEP 2940 
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8821 TAC CTG GCA GTC TAC CTA CAC TCG GAG CCC CGG CCC AAT GAG CAC AAC TGC TCG GCT AGC 

2941 YLAVYLHSEPRPNEHNCSAS 

8881 AGG AGG ATC CGC CCA GAG TCA CTC CAG GGT GCT GAC CAC CGG CCC TAC ACC TTC TTC ATT 

2961 R R I RPESIiQGADHRPYTFFI 

8941 TCC CCG GGG AGC AGA GAC CCA GCG GGG AGT TAC CAT CTG AAC CTC TCC AGC CAC TTC CGC 

2981 SPGSRDPAGSYHLN S S H F R 

9001 TGG TCG GCG CTG CAG GTG TCC GTG GGC CTG TAC ACG TCC CTG TGC CAG TAC TTC AGC GAG 

3001 WSALQVSVGLYTSLCQYFSE 

9061 GAG GAC ATG GTG TGG CGG ACA GAG GGG CTG CTG CCC CTG GAG GAG ACC TCG CCC CGC CAG 

3021 EDMVWRTEGLLPLEET SPRQ 

9121 GCC GTC TGC CTC ACC CGC CAC CTC ACC GCC TTC GGC GCC AGC CTC TTC GTG CCC CCA AGC 

3041 AVCLTRHLTAFGASLFVPPS 

9181 CAT GTC CGC TTT GTG TTT CCT GAG CCG ACA GCG GAT GTA AAC TAC ATC GTC ATG CTG ACA 

3061 H V R FVFPEPTADVNYI VHLT 

9241 TGT GCT GTG TGC CTG GTG ACC TAC ATG GTC ATG GCC GCC ATC CTG CAC AAG CTG GAC CAG 

3081 C A ' V CLVTYMVKAA I LH KLDQ 

9301 TTG GAT GCC AGC CGG GGC CGC GCC ATC CCT TTC TGT GGG CAG CGG GGC CGC TTC AAG TAC 

3101 LDA SRGRAI P FCGQRGR FKY 

9361 GAG ATC CTC GTC AAG ACA GGC TGG GGC CGG GGC TCA GGT ACC ACG GCC CAC GTG GGC ATC 

3121 E I L VKTGWGRGSGT TAHVG I 

94 21 ATG CTG TAT GGG GTG GAC AGC CGG AGC GGC CAC CGG CAC CTG GAC GGC GAC AGA GCC TTC 

3141 MLY GVDSRSGHRHLDGDRAF 

9481 CAC CGC AAC AGC CTG GAC ATC TTC CGG ATC GCC ACC CCG CAC AGC CTG GGT AGC GTG TGG 

3161 H R N S L V I FRI ATPH SL G SVW 

9541 AAG ATC CGA GTG TGG CAC GAC AAC AAA GGG CTC AGC CCT GCC TGG TTC CTG CAG CAC GTC 

3181 K I RVWHDNKGL SPAWFLQHV 

9601 ATC GTC AGG GAC CTG CAG ACG GCA CGC AGC GCC TTC TTC CTG GTC AAT GAC TGG CTT TCG 

3201 IVR DLQTARSAFFLVNDKLS 

9661 GTG GAG ACG GAG GCC AAC GGG GGC CTG GTG GAG AAG GAG GTG CTG GCC GCG AGC GAC GCA 

3221 VET E AKGGLV EKE V LA A SDA 

9721 GCC CTT TTG CGC TTC CGG. CGC CTG CTG GTG GCT GAG CTG CAG CGT GGC TTC TTT GAC AAG 

3241 ALLRFRRLLVAELQRGFFDK 

9781 CAC ATC TGG CTC TCC ATA TGG GAC CGG CCG CCT CGT AGC CGT TTC ACT CGC ATC CAG AGG 

3261 HIWLSIWDRP'PRS'RFTRIQR 

9841 GCC ACC TGC TGC GTT CTC CTC ATC TGC CTC TTC CTG GGC GCC AAC GCC GTG TGG TAC GGG 

3281 ATC CVLLICL FLGANAVKYG 

9901 GCT GTT GGC GAC TCT GCC TAC AGC ACG GGG CAT GTG TCC AGG CTG AGC CCG CTG AGC GTC 

3301 • A V G D S A t Y S T G H V S R LS P LS V 

9961 GAC ACA GTC GCT GTT GGC CTG GTG TCC AGC GTG GTT GTC TAT CCC GTC TAC CTG GCC ATC 

3321 DTVAVGLVSSVVVYPVYLA I 

10021 CTT TTT CTC TTC CGG ATG TCC CGG AGC AAG GTG GCT GGG AGC CCG AGC CCC ACA CCT GCC 

3341 LFLFRMSRSKVAGSPSPTPA 
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10081 GGG CAG CAG GTG CTG GAC ATC GAC AGC TGC CTG GAC TCG TCC GTG CTG GAC AGO TCC TTC 10140 

3361 GQQVLDIDSCLDSSVLPSSF 3 380 

10141 CTC ACG TTC TCA GGC CTC CAC GCT GAG CAG GCC TTT GTT GGA CAG ATG AAG ACT GAC TTG 10200 

3381 L T F SGLHAEQAFVG ^Q. H K S D L 3400 

10201 TTT CTG GAT GAT TCT AAG AGT CTG GTG TGC TGG CCC TCC GGC GAG GGA ACG CTC AGT TGG 10260 

3401 F h D D $ KSLVCWP S GEGTL £ W 3420 

10261 CCG GAC CTG CTC AGT GAC CCG TCC ATT GTG GGT AGO AAT CTG CGG CAG CTG GCA CGG GGC 10320 

3421 PPLLSDPSIVGSNLRQLARG 3440 

10321 CAG GCG GGC CAT GGG CTG GGC CCA GAG GAG GAC GGC TTC TCC CTG GCC AGC CCC TAC TCG 10380 

3441 QAGHGLGPEEDGF SLASPYS 3460 

10381 CCT GCC AAA TCC TTC TCA GCA TCA GAT GAA GAC CTG ATC CAG CAG GTC CTT GCC GAG GGG 10440 

3461 PAK. SFSASDEDL I QQVLAEG 3480 

10441 GTC AGC AGC CCA GCC CCT ACC CAA GAC ACC CAC ATG GAA ACG GAC CTG CTC AGC AGC CTG 10500 

3481 V S S PAPTQDTHMETDLLSSL 3500 

10501 TCC AGC ACT CCT GGG GAG AAG ACA GAG ACG CTG GCG CTG CAG AGG CTG GGG GAG CTG GGG 10560 

3501 S S T P G E K T E T L A h Q R L G E L G 3520 

10561 CCA CCC AGC CCA GGC CTG AAC TGG GAA CAG CCC CAG GCA GCG AGG CTG. TCC AGG ACA GGA 10620 

3521 PPS PGLNWEQPQAAP. LSRTG 354 0 

10621 CTG GTG GAG GGT CTG CGG AAG CGC CTG CTG CCG GCC TGG TGT GCC TCC CTG GCC CAC GGG 10680 

3541 LVEGLRKRLLPAWCASLAHG 3560 

10681 CTC AGC CTG CTC CTG GTG GCT GTG GCT GTG GCT GTC TCA GGG TGG GTG GGT GCG AGC TTC 1074 0 

3561 L SL LLVAVAVAVS GWVGASF 3580 

10*741 CCC CCG GGC GTG AGT GTT GCG TGG CTC CTG TCC AGC AGC GCC AGC TTC CTG GCC TCA TTC 10800 

3581 P PGVSVAWLLS SSASFLASF .. 3600 

10801 CTC GGC TGG GAG CCA CTG AAG GTC TTG CTG GAA GCC CTG TAC TTC TCA CTG GTG GCC AAG 10860 

3601 LGWEPLKVLLEALYFSLVAK 3620 

10861 CGG CTG CAC CCG GAT GAA GAT GAC ACC CTG GTA GAG AGC CCG GCT GTG ACG CCT GTG AGC 10920 

3621 RLH P D E D D T L V E S P A V T P V S 3640 

10921 GCA CGT GTG CCC CGC GTA CGG CCA CCC CAC GGC TTT GCA CTC TTC CTG GCC AAG GAA GAA 109 80 

3641 ARV PRVRPPHGFALFLAKEE 3660 

10981 GCC CGC AAG GTC AAG AGG CTA CAT GGC ATG CTG CGG AGC CTC CTG GTG TAC ATG CTT TTT 11040 

3661 A R K V KRLHGMLRS'LLVYML F 3680 

11041 CTG CTG GTG ACC CTG CTG GCC AGC TAT GGG GAT GCC TCA TGC CAT GGG CAC GCC TAC CGT 11100 

3681 L LVTLLASYGDASCHGKAYR 3700 

11101 CTG CAA AGC GCC ATC AAG CAG GAG CTG CAC AGC CGG GCC TTC CTG GCC ATC ACG CGG TCT 11160 

3701 LQSAIKQELHSRAFLAITRS 3720 

11161 GAG GAG CTC TGG CCA TGG ATG GCC CAC GTG CTG CTG CCC TAC GTC CAC GGG AAC CAG TCC 11220 

3721 E ELW PWMAHVLLPYVHGNQS 3740 

11221 AGC CCA GAG CTG GGG CCC CCA CGG CTG CGG CAG GTG CGG CTG CAG GAA GCA CTC TAC CCA 11280 

3741 S PELGPPRLR QVRLQEALYP 3760 

11281 GAC CCT CCC GGC CCC AGG GTC CAC ACG TGC TCG GCC GCA GGA GGC TTC AGC ACC AGC GAT 11340 

3761 D P PG PRVHTCSAAG'GFSTSD 3780 
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11341 TAC GAC GTT GGC TGG GAG AGT CCT CAC AAT GGC TCG GGG ACG TGG GCC TAT TCA GCG CCG 11400 

3781 YDVGWESPHNGSGTWA YSAP 3 800 

11401 GAT CTG CTG GGG GCA TGG TCC TGG GGC TCC TGT GCC GTG TAT GAC AGC GGG GGC TAC GTG 11460 

3801 DLL GAWSWGSCAVY .DS GGYV 3820 

11461 CAG GAG CTG GGC CTG AGC CTG GAG GAG AGC CGC GAC CGG CTG CGC TTC CTG CAG CTG CAC 11S20 

3821 QEL GLSLEESRDRLRF LQLH 3840 

11521 AAC TGG CTG GAC AAC AGG AGC CGC GCT GTG TTC CTG GAG CTC ACG CGC TAC AGC CCG GCC 11580 

3841 NWL DNRSRAVFLELTRYSPA 3860 

11581 GTG GGG CTG CAC GCC GCC GTC ACG CTG CGC CTC GAG TTC CCG GCG GCC GGC CGC GCC CTG 11640 

3861 VGLHAAVTLRLEFPAAGRAL ' 3880 

11641 GCC GCC CTC AGC GTC CGC CCC TTT GCG CTG CGC CGC CTC AGC GCG GGC CTC TCG CTG CCT 11700 

3881 A AL SVRPFALRRLSAGLSLP 3900 

11701 CTG CTC ACC TCG GTG TGC CTG CTG CTG TTC GCC GTG CAC TTC GCC GTG GCC GAG GCC CGT 11760 

3901 LL T SVCLLLFAVHFAVAEAR 3920 

11761 ACT TGG CAC AGG GAA .GGG CGC TGG CGC GTG CTG CGG CTC GGA GCC TGG GCG CGG TGG CTG 11820 

3921TWH REGRWRVLRLGAWARWL 3940 

11821 CTG GTG GCG CTG ACG GCG GCC ACG GCA CTG GTA CGC CTC GCC CAG CTG GGT GCC GCT GAC 11880 

3941 LVA LTAATALVRLAQ L G 'AA .D 3960 

11881 CGC CAG TGG ACC CGT TTC GTG CGC GGC CGC CCG CGC CGC TTC ACT AGC TTC GAC CAG GTG 1194 0 

3961 RQW TRFVRGRPRRFTS FDQV 3980 

11941 GCG CAC GTG AGC TCC GCA GCC CGT GGC CTG GCG GCC TCG CTG CTC TTC CTG CTT TTG GTC 1200 0 

3981 A H V S SAA'RGLAASLLFLLLV 4000 

12001 AAG GCT GCC CAG CAC GTA CGC TTC GTG CGC CAG TGG TCC GTC TTT GGC AAG ACA TTA TGC 12060 

4001 KAA QHVRFVRQWSVFG KTLC 4020 

12061 CGA GCT CTG CCA GAG CTC CTG GGG GTC ACC TTG GGC CTG GTG GTG CTC GGG GTA GCC TAC 12120 

4021 RAL PELLGVTLGLVVLGVAY 4040 

12121 GCC CAG CTG GCC ATC CTG CTC GTG TCT TCC TGT GTG GAC TCC CTC TGG AGC GTG GCC CAG 12180 

4 °41 A Q L A I LLVSS CVD .S LW S VAQ 4060 

12181 GCC CTG TTG GTG CTG TGC CCT GGG ACT GGG CTC TCT ACC CTG TGT CCT GCC GAG TCC TGG 1224 0 

4061 ALL VLCPGTGLSTLCPAESW 4080 

12241 CAC CTG TCA CCC CTG CTG TGT GTG GGG CTC TGG GCA CTG CGG CTG TGG GGC GCC CTA CGG 12300 

4081 H LS PLLCVGLWAL / RLWGALR 4100 

12301 CTG GGG GCT GTT ATT CTC CGC TGG CGC TAC CAC GCC TTG CGT GGA GAG CTG TAC CGG CCG 12360 

4101 L GA V I LRVfRYHALRGE LYRP 4120 

123 61 GCC TGG GAG CCC CAG GAC TAC GAG ATG GTG GAG TTG TTC CTG CGC AGG CTG CGC CTC TGG 12420 

4121 AWE PQDYEMVELFLRRLRLW 4140 

12421 ATG GGC CTC AGC AAG GTC AAG GAG TTC CGC CAC AAA GTC CGC TTT GAA GGG ATG GAG CCG 12480 

4141 MGL SKVKEFRHKVRFEGMEP 4160 

12481 CTG CCC TCT CGC TCC TCC AGG GGC TCC AAG GTA TCC CCG GAT GTG CCC CCA CCC AGC GCT 1254 0 

4161 LPS RSSRG'SKVSPDVP PPSA 4180 

12541 GGC TCC GAT GCC TCG CAC CCC TCC ACC TCC TCC AGC CAG CTG GAT GGG CTG AGC GTG AGC 12600 

4181 GSDASHPS TSSSQL'DGLSVS 4200 
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12601 CTG GGC CGG CTG GGG ACA AGG TGT GAG CCT GAG CCC TCC CGC CTC CAA GCC GTG TTC GAG 12660 

4201 LGRLGTRCEPEPSR LQAVFE 4220 

12661 GCC CTG CTC ACC CAG TTT GAC CGA CTC AAC CAG GCC ACA GAG GAC GTC TAC CAG CTG GAG 12720 

4221 ALLTQFDRLMQATEDVYQL-E 4240 

j 

12721 CAG CAG CTG CAC AGC CTG CAA GGC CGC AGG AGC AGC CGG GCG CCC GCC GGA TCT TCC CGT 12780 

4241 QQL H S h Q G R R S S R A P A G S S R 4260 

12781 GGC CCA TCC CCG GGC CTG CGG CCA GCA CTG CCC AGC CGC CTT GCC CGG GCC AGT CGG GOT 12840 

4261 GPSPGLR PALPSRLARA SKG 4280 

12841 GTG GAC CTG GCC ACT GGC CCC AGC AGG ACA CCC CTT CGG GCC AAG AAC AAG GTC CAC CCC 12900 

4281 V D , L AT. GPSRTPLRAKNKVH P 4300 



12 901 AGC AGC ACT TAG 
4301 S S : T * 
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1 MPPAAPARTA TAT/ST/^WLO ALA GGPGRGC G PCEPPCLCG PAPGAACRVN fSOROTfRTIfG PftL ft X PAPA.T E frPVSHNKA 80 
mlffnaJ peptide LRU cytalna~rlch nmino taxaan* 

81 ALIWCT.T.MH, SAIA KUlTgM WKTSTLTS EGI FANLFNLSE I NLSGNFFgC P CXfliftWh F OWA . RSPffVRYVOP EftATCAgPgg 160 

JC*^ 2 1,11,1 cyt«ifl«-rich earing t«rmnu* 

161 LAGOPLpGTP I^DSGCG EEY VACLPDNSSG TVAAVSFSAA HEGLLQPEAC SAFCFSTGQG IAALSBQGWC LCGAAQPSSA 240 
241 SFACLSLCSG PPAPPAPTCR GPTLLQHVFP ASPGATLVGP HfiEIiftSCOIift AFHIftftPfrPV, TDTRWPrSPg SAEVDAAGPft 320 

PJOtf XI . 

321 ASHRYVLPGR VTIVTAVT.AT/? AGSA1/LGTDV OVEAAPAXLE LVCPSSVQSD ESLDL.SIQNR GGSGLEAAYS IVALGEEPAR 400 

401 AVH Pr,gPflfyp FTFPGWGifCY R^WEKAAWL OAOEO POAWA GAATAMVPSP AVOR FLVSRV TRSLDWIGF STVOGVEVGP 480 

C-type lectin biodiaff, domain 
481 AFOGEAFSLE SgQNWLPftftP HFATAEHCVR LGPTGWCNTD LCSAPHSYVC ELOPG GPVQD AENLLVGAPS GDLQGPLTFL 560 

561 AQQDGLSAPH EPVEVMVFPG LRLSREAFLT TAEFGTQELR RPAQLRLQVY RLLSTAGTPE NGSEPESRSP DNRTQIAP££ 640 
641 MFGGRWCPGA flTCLPLDASC HFOACANGCT . S GPGLPGAFY ALWREFLFSV PAGPPAQYSV TLHGQDVLML PGDLVGLQHD 720 

721 AGPGALLHCS PAPGHPGPRA PYLSANASSW LPHLPAQLEG TWGCFACAXR LLAQREQLTV kLGLRPKPGL RLPGRYEVRA 800 

801 EVGNGVSRHN LSCSFDWSP VAGLRVTYPA PRDGRLYVPT NGSALVLQVD SGANATATAR WFGGSLSABF ENVCPALVAT 880 

881 FVPACPWETN DTLFSWALP WLSEGEKWD VWENSASRA NLSLRVTAEE PICGLRATPS PEARVLQGVL VRYSPWEAG 9 60 

961 SDKVFRWTIN DKQSLTFQNV VFNVTYQSAA VFKLSLTASN HVSNVTVNYN VTVERKNRMQ GLQVSTVPAV L SPNATLAI/T 1040 

1041 AGVLVPSAVE VAFT,WTFGDG EOALHOFO PP YNESFFVPDP SVAOVLVEKN VTHTYAAPGE YLLTVIASNA FENLTOOVPV 1120 

PKDl R2 * 

1121 SVKA SLPSVA VGVSTX5 VT,VA aRPVTFYPKP LPSFGGVLYT WPFGDGSPVL TOSOPAANHT .YASRGTYHVR LFATOTVSGA 1200 
PKD1 R3 

12 01 AAOADVKVF E ELRGLSVDMS r.AWOGAPW VSAAVOTGPK ITWTFDMGDG TVLSGPEATV EHVYLRAONC TVTVGAGSPA 12 80 
1281 ghy^arsIjHVTj vfv levlrve paa ctptofd arltayvtgn PAHYLFDWTF GDGSSNTTVR GCPTVTHNFT RSGTFPLALV 1360 

PKDl R5 

1361 LSSRVNRAHy FTSTCVE PEV nNVTLOPE RO FVOLGDEAWL VACAWPPFPY RYTWDFGTEE AAPTRARGFE VTFIYRDPG5 1440 

PKDl R€ 

14 41 YLVTVTASNM ISAANPSALV EVO EPVLVTS TKVNGSLGLE LOOPYLFSAV GRGRPASYLV? DLGTX5GWLBG PEVTHAYNST 1520 

PKDl F.7 

1521 GDFTVKVAG^ ynEVSRSEAWT. WWKR RVRG r.WNASRTW PLNGSVSFST SLEAGSDVRY SWVLCDRCTP TFGGPTISYT 1600 

PKDl R8 

1601 FRSVGTFNTI VTAEWEVGSA ODSTFVYVL O LIEGLQWGG GRYFPTNHTV PLQAWK.PST KVgYgfflTWR CTCTAtAgSg 1680 

1681 KGFSI/TVLEA GTYHVOLRAT NMLGSAW ADC TMDFVEPVGW LKVAASFNPA. AWTgYTLSA. SLftSSSgYYY. TWSttEECbSW 1*7 60 

17 61 ETSEFFTTHS FPTPGLKLVT ^TAGNFLGSA NAWEVPVO V PVSGLSIRAS EPGGgFVMG gSVTFWgQIA XGTWgWCWA 1840 

Ml 

1841 VPGGSSKRGP KVTKVFPDAG TFSlRLWAStJ AVSWVSATYN LTAE EPIVGL VLVJA5SKVVA, FGgl/VHFQIL LMGSAVTFR 1920 

PJCD1 R12 

1921 LOVGGANPEV LPGPRFSHSF PRVGDHWSV RGKNHVSWAO AOVRIWL EA VSGLQVPWCC EPgJATCT gR , N.FT A^y C>Rgg 2000 

PKD1 R13 

2001 RVAYAWYFSL QKV^DShVT LSGRPVTTTP VAAGT iLETOV RAFWAT.GSEN RTLVIjEVQDA VQYVALQSGP CFTKRSAPFE 2080 

2 081 AATSPSPRRV AYHWDFGDGS PGQ T ?TPF:PRA EHSYLRPGDY RVQVNASNLV SFFVAOATVT VQV LACKEPE VDWLPLQVL 2160 

PKJDl RU 

2161 KRRSQRNYLE AHVDLRDCVT YQTEYRWEVY RTASCQRPGR PARVALPGVD VSRPRLVLPR LALPVGHYCP VFWSFGDTP 22 40 
2241 LTQSIQANVT VAPERLVPII EGGSYRVWSD TRDLVLTXSSE SYDPNLEDGD QTPLSFHWAC VASTQREAGG CALNFGPRGS 23 20 
2321 STVTIPRERL AAGVEYTFSL TVWKAGRKEE ATNQTVLIRS GRVPIVSLEC VSCKAQAVYE VSRSSYVYU5 GRCLNCSSGS 2400 
2401 KRGRWAARTF SNKTLVLDET TTSTGSAGMR LVLRRGVLRD GEGYTFTLTV LGRSGEEEGC ASIRLSPNRP PLGGSCRLFP 24 80 
24 81 LGAVHALTTK VHFECTGWHD AEDAGAPLVY ALLLRRCRQG HCEEFCVYKG SLSSYGAVLP PGFRPHFEVG LAWVQDQLG 2S60 
2561 AAWALNRSL AITLPEPNGS ATGLTVWLHG LTASVLPGL.L RQADPQHVIE /YSLALVTVLN EYERALDVAA EPKHERQHRA 2 64 0 
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IDENTIFICATION OF POLYCYSTIC KIDNEY 
DISEASE GENE, DIAGNOSTICS AND TREATMENT 



This is a continuation-in-part of U.S. Serial No. 
08/253,524, filed, June 3, 1994, wffich is incorporated by 
reference herein in its entirety. 

1. INTRODUCTION 
The present invention relates to the identification of 
the gene, referred to as the PKD1 gene,, mutations in which 
are responsible for the vast majority of cases involving 
autosomal dominant polycystic kidney disease (ADPKD) . The 
PKD1 gene, including the complete nucleotide sequence of the 
gene's coding region are presented. Further, the complete 
PKD1 gene product amino acid sequence and protein structure 
and antibodies directed against the PKD1 gene product are 
also presented. Additionally, the present invention relates 
to therapeutic methods and compositions for the treatment of 
ADPKD symptoms. Methods are also presented for the 
identification of compounds that modulate the level of 
expression of the PKD1 gene or the activity of mutant PKD1 
gene product, and the evaluation and use of such compounds in 
the treatment of ADPKD symptoms. Still further, the present 
invention relates to prognostic and diagnostic, including 
prenatal, methods and compositions for the detection of 
mutant PKD1 alleles and/or abnormal levels of PKD1 gene 
product or gene product activity. 

2. BACKGROUND OF THE INVENTION 
Autosomal dominant polycystic kidney disease (ADPKD) is 
among the most prevalent dominant human disorders, affecting 
between 1 in 1,000 and 1 in 3,000 individuals worldwide 
(Dalgaard, 0«Z., 1957, Acta. Med. Scand. 158:1-251). The 
major manifestation of the disorder is the progressive cystic 
dilation of renal tubules (Gabow, P. A., 1990, Am. J * Kidney 
Dis. 16:403-413), leading to renal failure in half of 
affected individuals by age 50. 
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ADPKD-associated renal cysts may enlarge Co contain 
several liters of fluid and the kidneys usually enlarge 
progressively causing pain. Other abnormalities such as 
pain, hematuria, renal and urinary infection, renal tumors, 
5 salt and water imbalance and hypertension frequently result 
from the renal defect. Cystic abnormalities in other organs, 
including the liver, pancreas, spleen and ovaries are 
commonly found in ADPKD. Massive liver enlargement 
occasionally causes portal hypertension and hepatic failure. 

10 Cardiac valve abnormalities and an increased frequency of 

subarachnoid and other intracranial hemorrhage have also been 
observed in ADPKD. Progressive renal failure causes death in 
many ADPKD patients and dialysis and transplantation are 
frequently required to maintain life in these patients. 

15 Although end- stage renal failure usually supervenes in middle 
age (ADPKD is sometimes called adult polycystic kidney 
disease) , children may occasionally have severe renal cystic 
disease . 

Although studies of kidneys from ADPKD patients have 

20 demonstrated a number of different biochemical, structural 
and physiological abnormalities, the disorder's underlying 
causative biochemical defect remains unknown. Biochemical 
abnormalities which have been' observed have involved protein - 
sorting, the distribution of cell membrane markers within 

25 renal epithelial cells, extracellular matrix, ion transport, 
epithelial cell turnover, and epithelial cell proliferation. 
The most carefully documented of these findings are 
abnormalities in the composition of tubular epithelial cells, 
and a reversal of the normal polarized distribution of cell 

30 membrane proteins, such as the Na*/K* ATPase (Carone, F . A, et 
al., 1994, Lab. Inv. 70:437-448.). 

As the name implies, ADPKD is inherited as an autosomal 
dominant disorder. Three distinct loci have been shown to 
cause phenotypically indistinct forms of the disease, with 

35 greater than 85-90% of disease incidence being due to 

mutations which map to the short arm of chromosome 16, as 
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discussed below. Despite intensive investigation, the 
molecular defect responsible for ADPKD is not known. 

In 1985 Reeders et al . (Reeders et al . , Nature 317: 542 , 
1985) carried out genetic linkage studies of a large number 
5 of ADPKD families and demonstrated that a gene on the short 
arm of chromosome 16 was mutated in most cases of ADPKD. 
This gene has been designated PKD1 by the Nomenclature 
Committee of the Human Gene Mapping- Workshop and the Genome 
Data Base of the Welch library, John Hopkins University. 

10 Further linkage studies have identified a set -of genetic 
markers that flank the gene-rich region containing : the PKDl 
gene (Reeders et al * , 1988, Genomics 3.* 150; Somlo et al . , 
1992, Genomics 12:152; Breuning et al . f 1990, J. Med. Genet. 
2J7:603 ; Germino et al., 1990, Am. J. Hum. Genet* 46 : 925) . 

15 These markers have been mapped by a variety of physical 
mapping techniques including fluorescent in situ 
hybridization and pulsed-field gel electrophoresis {Gillespie 
et al,, 1990, Nucleic Acids Research 18:7071) . It has been 
shown that the closest distal genetic marker (D16S259; on the 

20 telomeric side of the PKDl locus) lies within 750 kb of the 
closest proximal genetic marker (D16S25; on the centromeric 
side of the PKDl locus) . The interval between the genetic 
markers has been cloned in a series of overlapping cosmid and 
bacteriophage genomic clones (Germinb et al . , 1992, Genomics 

25 JL3:144), which contain the entire PKDl interval, with the 

exception of two gaps of less than 10 kb and less than 50 kb. 
Restriction mapping of these clones has confirmed that the 
interval between the flanking genetic markers is 750 kb. 

While genetic mapping studies such as these have begun 

30 to narrow the' region within the human genome in which the 
gene responsible for ADPKD lies, there exist an estimated 
twenty or more genes within this 750 kb interval. Given the 
prevalence and severity of ADPKD, however, it is of great 
importance to eludicate which, if any, of these postulated 

35 genes corresponds to PKDl. 
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3 . SUMMARY OF THE INVENTION 
The present invention relates to methods and 
compositions for the diagnosis and treatment of autosomal 
dominant polycystic kidney disease (^DPKD) . Specifically, a 
5 novel gene, referred to as the PKD1 gene, is described in 
Section 5.1. Mutations within the PKD1 gene are responsible 
for approximately 90% cases of ADPKD * Additionally, the PKD1 
gene product, including the nucleotide sequence of the 
complete coding region is described in Section 5.2. 

10 Antibodies directed against the PKD1 gene product are 
described in Section 5-3. 

--'Further, the present invention relates to therapeutic 
methods and compositions for the amelioration of ADPKD 
symptoms. These therapeutic techniques are described in 

15 Sections 5.9 and 5.10. Methods are additionally presented 
for the identification of compounds that modulate the level 
of expression of the PKD1 gene or the activity of PKD1 mutant 
gene products, and the evaluation and use of such compounds 
as therapeutic ADPKD treatments. Such methods are described 

20 in Section 5.8. 

Still further, the present invention relates to 
prognostic and diagnostic, including prenatal, methods and 
compositions whereby the PKDl gene and/or gene product can be 
used ^.to identify individuals carrying- mutant PKDl alleles, 

25 exhibiting an abnormal level of PKDl gene product or gene 
product activity. Additionally, the present invention 
describes methods which diagnose subjects exhibiting ADPKD 
symptoms. Such techniques are described in Section 5.12. 
Additionally, the present invention relating to the use of 

30 PKDl animal knockout screening assays for the identification 
of compounds useful for the amelioration of ADPKD symptoms. 

The coding region of the PKDl gene is complex and 
extensive, having a size of approximately 60 kb and 
containing a total of 46 exons, the sequence of which, until 

35 now, has been difficult to obtain for a number of reasons. 
First, the majority (approximately the first two thirds) of 
the PKDl gene is duplicated several times in a transcribed 
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4 . DESCRIPTION OF THE FIGURES 
FIG. 1. A map of the PKD1 interval showing the cosmids 
and bacteriophage clones covering the region (Taken from 
Germino et al, 1992, Genomics 13.-144^.) The PKDl region as 
5 defined by flanking markers extends from D16S259 (pGGGl) to 
D16S25, a span of approximately 750kb. Single-copy probes 
used in pulsed-field gel mapping of the region are shown 
above the line (pGGGl , CMM65b, etc.). C, M, ?, N and B are 
sites for restriction enzymes Clal, Mlul, Pvul , NotI and 
10 SssH.II, respectively. Sites that cleave in genomic DNA from 
only., some tissues are shown in parenthesis. Bold bars (a-z, 
aa) --.represent the extents of the coding regions (see Table 
2) . Horizontal lines 1-38 represent cosmid and phage clones 
spanning the PKDl region, as shown here: 



15 



20 



l=cJCl 

2=cJC2 

3=cDEBl 

4=CDEB4 

5-CDEB7 

6=cDEB8 

7=cDEB9 

8=cDEB10 



9=cDEBll 
10=cGGG10 
11=CGGG1 
12=cGGG2 

13- CGGG3 

14- cGGG4a 
15=cGGG4b 
16=cGGG6 



17=cKLH4 
18=cKLH6 
19=cKLH7 
20=cKLH8 
21=CKLH9 
22-CNK32 
23=cNK31 
24=cGGG8 



25=cNK3 0 
26=XLCNlwl 

2 7=XLCNw2J2 
28=XLCNwlw3 
29=XLCNw5.2 

3 0=XNK92.6w5. 
31=XNK92.6w4. 
32=cNK92 .6wl. 



33=cNK92 
34=cNK92 
35=cNK63 
36=cNK14 
37=cGOS4 
25 38=cCOS3 



FIG. 2. A map of the PKDl region as defined by flanking 
markers. The region extends from D16S259 (pGGGl) to w5.2CA, 
a microsatellite repeat that lies within XLCNw5.2, a span of 
30 approximately -480kb. The labels are as for FIG. 1. 



FIG. 3A-B. Genomic DNA from 40 unrelated ADPKD patients 
was amplified by PCR for SSCP analysis. Primers F23 and R23 
(See Table 1, below) were used to amplify an exon of 298bp. 
35 Variant SSCP patterns were seen in two ADPKD patients under 
the following conditions. Each of the patients was 
heterozygous for the normal pattern and the variant pattern. 
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The pattern seen in these patients was not seen in normal 
individuals. Arrow indicates non-denatured DNA. 

FIG. 4- A map (not to scale), derived from the cosmid 
5 contig cGGGl, cGGGlO and cDEBll, of the genomic region 

containing the PKD1 gene. The horizontal black bars show the 
positions of the three cosmids. The discontinuities in these 
bars indicate that the full extent of cGGGl and cDEBll are 
not shown. The map was constructed using restriction enzyme 

10 data from several enzymes. BamHI , EcoRI and NotI restriction 
sites are shown. The numbers below the horizontal line 
represent distances in kilobases between adj acentv restriction 
sites. The PKD1 cDNA clones are shown above as grey bars. 
These clones hybridize to the restriction fragments shown 

15 immediately below them in the genomic map. 

FIG. 5A. Structure of the PKD1 gene transcript. The bar 
at the top represents the PKD1 exon map. A total of 46 exons 
were identified. Below the gene transcript map are 
20 depictions of the overlapping cDNA clones, with putative 
alternatively spliced regions as indicated. 

FIGS. 5B-5C. PKD1 exons. This chart lists PKD1 exon 
sizes and indicates which cDNA clones contain nucleotide 
25 sequences corresponding to sequences present within specific 
exons . 

FIG. 6. PKD1 nucleotide and amino acid sequences. 
Depicted herein are, top line, the nucleotide sequence of the 
30 entire PKD1 coding region (SEQ ID NO: 1), and, bottom line, 
the PKD1 derived amino acid sequence (SEQ ID NO: 2) , given in 
the one- letter amino acid code. 

FIGS, 7A-7B. The derived amino acid sequence of PKD1 gene 
35 product (SEQ ID NO: 2) , The putative peptide domains of the 
PKD1 gene product are depicted underneath the amino acid 
sequence . 
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FIG, 8. A schematic representation of the PKDi gene 
product, with each of its putative domains illustrated. 

FIG* 9. SSCP analysis. Genomic .J)NA from a total of 6 0 
5 unrelated ADPKD patients was amplified by PCR for SSCP 
analysis. Intronic primers F25 and Mill-lR (see Section 
10,1, below) were used for amplification. A variant SSCP 
pattern was seen in one individual/ The amplified DNA from 
this individual was then reamplified with the intronic 

10 primers KG8-F31 and KG8-R35 (see Section 10.1, below). Both 
strands of the reamplied DNA were sequenced, using F25 and 
MilfclR as sequencing primers. As discussed in Section 10.2, 
below, sequencing revealed a C to T transition which created 
a stop codon at PKDI amino acid position 765. The pattern 

15 seen in these patients was not seen in normal individuals. 

5. DETAILED DESCRIPTION OF THE INVENTION 
Methods and compositions for the diagnosis and treatment of 
(ADPKD) are described herein. Specifically, the gene, 
2 0 referred to herein as the PKDI gene, in which mutations occur 
that are responsible for the vast majority of ADPKD cases is 
described. Further/ the PKDI gene product and antibodies 
directed against the PKDI gene product are also presented. 
Therapeutic methods and compositions are described for the 
25 treatment and amelioration of ADPKD symptoms. Further, 

methods for the identification of compounds that modulate the 
level of expression of the PKDI gene or the activity of 
mutant PKDI gene product, and the evaluation and use of such 
compounds in the treatment of ADPKD symptoms are also 
30 provided. 

Still further, prognostic and diagnostic methods are 
described for the detection of mutant PKDI alleles, of 
abnormal levels of PKDI gene product or of gene product 
activity. 
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5.1. THE PKD1 GENE 
The PKD1 gene, mutations in which are responsible for 
greater than 9 in 10 cases of ADPKD, is described herein. 
Specifically, the strategy followed £o identify the PKDl gene 
5 is briefly discussed, as is the strategy for obtaining the 
complete nucleotide sequence of the gene. Further, the PKDl 
nucleotide sequence and alternative splicing features are 
described. Still further, nucleic acid sequences that 
hybridize to the PKDl gene and which may be utilized as 
10 therapeutic ADPKD treatments and/or as part of diagnostic 
methods are described. Additionally, methods for the 
production or isolation of such PKDl nucleic acid molecules 
and PKDl -hybridizing molecules are described. 

15 5.1.1. IDENTIFICATION OF THE. PKDl GENE 

Prior to the present invention, it had only been known that 
the physical location of the PKDl gene within the human 
genome was somewhere within a 750 kb chromosomal region on 
the short arm of chromosome 16* As presented herein, the 

20 interval in which this gene lies has now been reduced until 
the specific PKDl gene has been identified out of this large 
portion of DNA. 

Briefly, the strategy which was followed to identify the 
PKDl gene is as described herein. First, as demonstrated in 

25 the Example presented in Section 6, below, the 750 kb PKDl 
interval was first substantially narrowed to approximately 
460 kb, via genetic linkage studies. Next, as shown in the 
Example presented in Section 7, below, a maximum of 27 
transcriptional units (TUs) were identified within this 

30 approximately. 460 kb PKDl interval. The total length of 
these TUs was approximately 300 kb. Thus, the region 
containing the PKDl coding region was narrowed down to a 
region of approximately 300 kb. 

Next, as presented in the Example shown in Section 9, 

35 below, a Northern analysis was conducted with mRNA isolated 
from normal and ADPKD patient kidney tissue, in order to 
attempt to compare the pattern of ADPKD pathology to the 
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expression profile of the TUs within the PKDl interval. One 
of the TUs , Nik9, was eliminated by such an analysis, which 
indicated undetectable expression in the kidney and liver. 
In addition, as demonstrated in the Example presented in 
5 Section 9, below, a systematic search was undertaken using 
several independent techniques, including Southern analysis 
SSCP, DGGE and direct sequencing of coding sequences, to 
detect mutations in ADPKD patients within the TUs of the PKDl 
region. By conducting such a mutation screen, greater than 
10 80% of the combined identified coding sequences in the PKDl 
region were excluded, thus further substantially narrowing 
down the region in which the PKDl gene could lie. The screen 
was initially performed on individual genes until virtually 
all the coding sequences were shown to be devoid of 
15 mutations. The focus on possible PKDl candidates was further 
honed by the recognition that PKDl demonstrated one of the 
highest new mutation rates known for human diseases. Based 
on this observation, it was hypothesized that either the PKDl 
gene contained a highly mutable site or that the gene 
20 presented a large number of potential mutation sites, each 
mutable at a regular frequency. Such a hypothesis is 
supported by the absence of substantial linkage 
disequilibrium among selected population groups. Further, 
this, hypothesis predicted that if the PKDl gene was a small 
25 transcript, it should contain a highly mutable element. 

Trinucleotide repeat expansion represent one of the major 
sources for dominant mutations such as the ADPKD-causing 
mutations which arise in the PKDl gene. A systematic search 
for such highly mutable trinucleotide repeats was conducted 
30 within the TUs in the remaining region wherein PKDl could 
lie, but no such repeats were identified. 

The only other explanation for the high mutational 
prevalence is that the gene is physically large and presents 
a large target for mutations. Of the TUs, nik823, within the 
35 potential PKDl region that had not been excluded by other 

means, only two were of a size that could potentially support 
such a high mutation rate. As demonstrated in the Example 
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presented, below, in Section 9, a search for ADPKD 
correlative mutations within one of these TUs failed to 
identify any such mutations, causing it to be excluded as a 
candidate PKD1 gene. Ultimately, as/ demonstrated in the 
5 Example presented in Section 10, below, one of these 

polymorphisms has been shown to be a de novo mutation which 
is predicted to lead to the production of a truncated PKD1 
protein in the affected individual, These finding are highly 
suggestive, if not proof, that the identified gene is the 
10 PKD1 gene. 

Thus, the examples presented below in Sections .6 through 11 
demonstrate, through a variety of techniques, the genetic and 
molecular characterization of the PKD1 region, and ultimately 
demonstrate that the PKD1 gene, dominant mutations in which 
15 cause ADPKD, has been identified. 

5.1-2. SEQUENCING OF THE PKD1 GENE 
As discussed, below, in Section 5.1,3, the nucleotide 
sequence of the entire coding region of the PKD1 gene has now 

20 successfully been isolated and sequenced. In order to 
achieve this goal, however, a number of PKDl-specif ic 
impediments had to be overcome. The strategy for obtaining 
the PKD1 gene sequence is discussed, briefly, in this 
Section, The Example presented below, in Section 11, 

25 discusses this sequencing strategy in more detail- 
First, the PKD1 gene is very large, (approximately 60 
kb) , as is the PKD1 transcript, being approximately 14.5 kb 
in length. In addition to this size difficulty, 
approximately two thirds of the 5' end of the gene is 

30 duplicated several times in a highly similar, transcribed 
fashion elsewhere in the human genome (Germino, G*G. et al-, 
1992, Genomics 11:144-151; European Chromosome 16 Tuberous 
Sclerosis Consortium, 1993, Cell 7^:1305-1315) . 

The near- identity of the sequence of cDNA derived from 

35 PKD1 and from the PKDl-like duplications made the likelihood 
of piecing together a full-length PKDl transcript by merely 
screening cDNA libraries via hybridization very low. Such a 
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screening method would be as likely to identify transcripts 
originating from both the PKDl-like duplicated regions as 
from the authentic PKD1 locus. In fact, if each of the 
duplicated loci were as transcriptionally active as the 
5 auhentic PKD1 locus, the representation of authentic PKD1 
cDNA clones among the total positive clones, would be very 
low. 

Thus, a strategy was developed for obtaining the 
authentic PKD1 sequence which included, first, a plan for 

10 obtaining the highest quality of both genomic sequence 

spanning the duplicated region as well as obtaining duplicate 
coverage of cDNA sequence spanning the expected length of the 
PKDl transcript; second, to compare the cDNA sequences to the 
genomic sequence spanning the duplicated region, thus 

15 identiying PKDl exons ; and, finally, to assemble the 

identified exons into a full-length PKDl coding sequence. 
The isolation of both PKDl genomic and cDNA sequence and, 
further, the aligning of such sequences, however, proved to 
be very difficult. 

20 PKDl genomic DNA (whch totals approximately 60 kb) 

proved to be particularly difficult to characterize for a 
number of reasons* First, portions of PKDl genomic DNA 
(specifically, regions within cosmid cGGGlO) tended to be 
preferentially subcloned. For example, screens for 

25 trinucleotide repeats in the cGGGlO cosmid identified one 
CCT -positive subclone in a Sau3A-generated library of cGGGlO 
sublcones. This region was, however, vastly underrepresented 
in both the Sau3A library ( i .e. , approximately 1 clone out of 
over 10 # 000) and subsequent sheared cosmid libraries (in 

30 which no such clones were isolated) ♦ A plasmid sublone 

containing the region, G13 , proved difficult to grow and to 
sequence. Sequence analysis of the clone revealed a highly 
monotonous series of purines (A and G) . Such sequences are 
thought to make the clone difficult to stably propagate in 

35 bacteria- Thus, in order to ascertain the level of 

representation of the cosmid, it was necessary to construct a 
detailed physical map of the cGGGlO cosmid. 
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Second, genomic sequence within the PKD1 region is very 
GC-rich (approximately 70%), and forms extensive, stable 
secondary structures. These PKD1 genomic DNA features made 
the task of obtaining accurate nucleotide sequence very 
5 difficult- Several alternative sequencing conditions, 
including different polymerases, melting conditions, 
polymerization conditons and combinations thereof had to be 
utilized before such sequence was obtained. However, even 
when reliable nucleotide sequence became available, the 
10 extensive amount of repeated sequences within the genomic 

made the aligning of sequence information very difficult . It 
became necesary for accurate aligning of sequences, 
therefore, to use the fine physical map which had been 
created earlier. 

15 The sequencing of PKD1 cDNA also presented a number of 

PKDl-specif ic difficulties. First, the 14 kb size of the 
transcript made it impossible to isolate a single cDNA clone 
containg the entire PKD1 transcript . Overlapping partial 
cDNA clones, therefore, had to be obtained in order to piece 

20 together an entire sequence. Partial cDNA clones were 
obtained by sequencing the ends of one cDNA insert, 
synthesizing probes using this sequence, and obtaining 
overlapping cDNA clones by their hybridization to such 
probes. Second, the PKD1 gene was poorly represented in 

25 renal cDNA libraries, and, in fact, its expression- appeared 
to be low in a number of tissues, making the isolation of 
PKD1 cDNA clones especially difficult. 

5.1.3. THE PKD1 GENE 
3 0 Described, herein is the complete nucleotide sequence of 

the extensive PKD1 gene coding region. Further, PKDl 
alternative splicing features are discussed, below. 

The coding region of the PKDl gene is complex and 
extensive, containing a total of 46 exons and producing a 
35 transcript of approximately 14 kb in length. FIG. 5A depicts 
the structure of the PKDl gene transcript. A total of 46 
exons were identified within the PKDl gene. Additionally, 
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sequence analysis from a number of cDNA clones reveals that 
the gene may have alternatively spliced forms. FIGS - 5B-5C 
show a table of exons, listing exon sizes and indicating 
which cDNA clones contain nucleotide sequences corresponding 
5 to sequences present within specific exons, 

FIGS, 6A-6P depict the PKD1 nucleotide sequence. 
Specifically, the top line of FIGS . 6A-6P shows the 
nucleotide sequence of the entire PKD1 coding region (SEQ ID 
NO: 1) « The term " PKD1 gene", as used herein, refers to (a) 

10 the, .nucleotide sequence depicted in FIGS, 6A-6P (SEQ ID NO: 
1) ; , ..{b) any DNA sequence that hybridizes to the complement of 
the-nucleotide sequence depicted in FIGS . 6A-6P (SEQ ID NO: 
1) , under highly stringent conditions, e.g. , hybridization to 
filter-bound DNA in 0.5 M NaHP0 4/ 7% sodium dodecyl sulfate 

15 (SDS) , 1 mM EDTA at 65°, and washing in 0 , lxSSC/0 . 1% SDS at 
68°C (Ausubel F.M. et al . , eds . , 1989, Current Protocols in 
Molecular Biology, Vol. I, Green Publishing Associates, Inc., 
and John Wiley & sons, Inc., New York, at p. 2.10.3) and 
which encodes a gene product functionally equivalent to the 

20 PKDl gene product (SEQ ID NO: 2) depicted in FIGS. 6A-6P ; 
and/or (c) any DNA sequence that hybridizes to the complement 
of the nucleotide sequence depicted in FIGS. 6A-6P (SEQ ID 
NO: 1) under less stringent conditions, such as moderately 
stringent conditions, e.g. , washing an 0.2xSSC/0,l% SDS at 

25 42°C-; (Ausubel et al . , 1989, supra), yet which still encodes a 
gene, product functionally equivalent to the PKD1 gene product 
depicted in FIGS. 6A-6P (SEQ ID NO: 2). 

The term "functionally equivalent" as used herein can 
refer to: 1) a gene product or peptide having the biological 

30 function of the PKDl gene product depicted in FIGS. 6A-6P 
and/or the biological function of a PKDl peptide domain, as 
depicted in FIGS. 7A-7B and 8; 2) a gene product containing 
at least one PKDl peptide domain as depicted in FIGS* 7A-7B 
and 8; or 3) a gene product having an 80% overall amino acid 

35 residue similarity to the PKDl gene product depicted in FIGS „ 
6A-6P. The term "functionally equivalent gene" as used 
herein can further refer a nucleotide sequence which encodes 
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a gene product of 1, 2 or 3 , as described earlier in this 
paragraph. 

The invention also includes nucleic acid molecules, 
preferably DNA molecules, that hybridize to # and are 
5 therefore the complements of, the DNA sequences (a) through 
(c) , in the preceding paragraph- Such hybridization 
conditions may be highly stringent or less highly stringent, 
as described above. In instances wherein the nucleic acid 
molecules are oligonucleotides ("oligos" ), highly stringent 

10 conditions may refer, e.g. , to washing in 6xSSC/0._05% sodium 
pyrophosphate at 37°C (for 14-base oligos), 48°C (for 17~base 
oligos), 55°C (for 20-base oligos), and 60°C (for 23-base 
oligos) . These nucleic acid molecules may act as PKD1 
antisense molecules, useful, for example, in PKD1 gene 

15 regulation and/or as antisense primers in amplification 
reactions of PKD1 nucleic acid sequences. Further, such 
sequences may be used as part of ribozyme and/or triple helix 
sequences, also useful for PKD gene regulation. Still 
further, such molecules may be used as components of 

2 0 diagnostic methods whereby the level of PKD1 transcript may 
be deduced and/or the presence of an ADPKD-causing allele may 
be detected. Further, such sequences can be used to screen 
for and identify PKD1 homologs from, for example, other 
species. 

25 The invention also encompasses (a) DNA vectors that 

contain any of the foregoing coding sequences and/or their 
complements ( i.e. , antisense) ; (b) DNA expression vectors 
that contain any of the foregoing coding sequences 
operatively associated with a regulatory element that directs 

30 the expression of the coding sequences; and (c) genetically 
engineered host cells that contain any of the foregoing 
coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 
in the host cell. As used herein, regulatory elements 

35 include but are not limited to inducible and non- inducible 
promoters, enhancers, operators and other elements known to 
those skilled in the art that drive and regulate expression* 
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For example, such regulatory elements may include CMV 
immediate early gene regulatory sequences, SV40 early or late 
promoter sequences on adenovirus, lac system, trp system, tac 
system or the trc system sequences . ^ The invention includes 
5 fragments of any of the DNA sequences disclosed herein. 

In addition to the PKD1 gene sequences described above, 
homologs of the PKD1 gene of the invention, as may, for 
example be present in other, non- human species, may be 
identified and isolated by molecular biological techniques 

10 well: known in the art and, for example, labelled probes of 
small as 12 bp. Further, mutant PKDl alleles and additional 
normal alleles of the human PKDl gene of the invention, may 
be identified using such techniques. Still further, there 
may exist genes at other genetic loci within the human genome 

15 that encode proteins which have extensive homology to one or 
more domains of the PKDl gene product. Such genes may also 
be identified via such techniques. 

For example , such a previously unknown PKDl -type gene 
sequence may be isolated by performing a polymerase chain 

20 reaction (PCR; the experimental embodiment set forth by 
Mullis, K.B., 1987, U.S. Patent No. 4,683,202) using two 
degenerate oligonucleotide primer pools designed on the basis 
of amino acid sequences within the PKDl gene described herein 
(see, e.g. FIGS. 6A-6P, SEQ ID NO: 2) . The template for the 

25 reaction may be cDNA obtained by reverse transcription of 
mRNA; prepared from human or non-human cell lines or tissue 
known to express a PKDl allele or PKDl homologue. The PCR 
product may be subcloned and sequenced to insure that the 
amplified sequences represent the sequences of a PKDl or a 

30 PKD-like nucleic acid sequence. The PCR fragment may then be 
used to isolate a full length PKDl cDNA clone by 
radioactively labeling the amplified fragment and screening a 
bacteriophage cDNA library. Alternatively, the labeled 
fragment may be used to screen a genomic library. For a 

35 review of cloning strategies which may be used, see e.g., 
Maniatis, 1989, Molecular Cloning, A Laboratory Manual, Cold 
Springs Harbor Press, N.Y.; and Ausubel et al., 1989, Current 
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Protocols in Molecular Biology, (Green Publishing Associates 
and Wiley Interscience, N.Y.) . 

5.2. THE PKD1 GENE PRODUCT 
5 The PKD1 gene products of the invention include the PKD1 

gene product encoded by the PKD1 nucleotide sequence depicted 
in FIGS. 6A-6P (SEQ ID NO: 2) . The PKD1 gene product shown 
in FIGS. 6A-6P is a protein of 4304 amino acid residues, with 
a predicted mass of approximately 467 kilodaltons. This PKD1 

10 gene product contains as least five distinct peptide domains 
which are likely to be involved in protein-protein and/or 
protein-carbohydrate interactions- Further, this -PKD1 gene 
product shares amino acid sequence similarity with a number 
of extracellular matrix proteins. (See FIGS. 7A-7B and 8, 

15 which list the PKD1 gene product domains J The PKD1 gene 
product domains are more fully described below, in the 
Example presented in Section 10. 

In addition / PKD1 gene products that represent 
functionally equivalent gene products are within the scope of 

20 the invention. "Functionally equivalent" as used herein is 
as defined in Section 5,1, above. Such an equivalent PKD1 
gene product may contain deletions, additions or 
substitutions of amino acid residues within the PKD1 sequence 
encoded by the PKD1 gene sequences described, above, in 

25 Section 5.1.3, but which result in a silent change thus 

producing a functionally equivalent PKD1 protein. : Such amino 
acid substitutions may be made on the basis of similarity in 
polarity, charge, solubility, hydrophobicity, hydrophilicity, 
and/or the amphipatic nature of the residues involved- For 

30 example, negatively charged amino acids include aspartic acid 
and glutamic acid; positively charged amino acids include 
lysine and arginine; amino acids with uncharged polar head 
groups having similar hydrophilicity values include the 
following: leucine, isoleucine, valine, glycine, analine, 

35 asparagine, glut amine, serine, threonine, phenylalanine and 
tyrosine. As used herein, a functionally equivalent PKD1 
refers to a protein that exhibits substantially the same 
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biological activity as the PKD1 gene product encoded by the 
PKD1 gene sequences described in Section 5.1,1, above. 

PKDl gene products and peptides substantially similar to 
the PKDl gene product encoded by the PKDl gene sequences 
5 described in Section 5,1, above, which cause ADPKD symptoms 
are also intended to fall within the scope of the invention. 
Such gene products and peptides may include dominant mutant 
PKDl gene products, or PKDl gene products functionally 
equivalent to such mutant PKDl gene products. By 

10 "functionally equivalent mutant PKDl gerie product" it is 
meant PKDl-like proteins that exhibit a biological activity 
substantially similar to the activity demonstrated by 
dominant mutant PKDl gene products. 

The PKDl wild type or mutant protein may be purified 

15 from natural sources, as discussed in Section 5.2.1, below, 
or may, alternatively, be chemically synthesized or 
recombinantly expressed, as discussed in Section 5*2.2, 
below. 

20 5.2.1 PKDl PROTEIN PURIFICATION METHODS 

The PKDl protein may be substantially purified from 
natural sources ( e.g. , purified from cells) using protein 
separation techniques well known in the art. "Substantially 
purified" signifies purified away from at least about 90% (on 

25 a weight basis) , and from at least about 99% of other 

proteins, glycoproteins, and other macromolecules normally 
found in such natural sources . 

Such purification techniques may include, but are not 
limited to ammonium sulfate precipitation, molecular sieve 

30 chromatography, and/or ion exchange chromatography. 

Alternatively , or additionally, the PKDl gene product may be 
purified by immunoaf f inity chromatography using an 
immunoabsorbent column to which an antibody is immobilized 
which is capable of binding the PKDl gene product . Such an 

35 antibody may be monoclonal or polyclonal in origin. If the 
PKDl gene product is specifically glycosylated, the 
glycosylation pattern may be utilized as part of a 
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purification scheme via, for example, lectin chromatography. 

The cellular sources from which the PKD1 gene product 
may be purified may include, but are not limited to, those 
cells that are expected, by Northern and/or Western blot 
5 analysis, to express the PKD1 gene. Prefera; ~y, such 

cellular sources are renal tubular epithelial cells, bilary 
duct cells, skeletal muscle cells, whole brain cells, lung 
alveolar epithelial cell, and placental cell- 
One or more forms of the PKD1 gene product may be 

10 secreted out of the cell, i.e. , may be extracellular. Such 
extracellular forms of the PKD1 gene product may preferably 
be purified from whole tissue rather than cells, utilizing 
any of the techniques described above. Preferable tissue 
includes, but is not limited to those tissues than contain 

15 cell types such as those described above. Alternatively, 
PKD1 expressing cells such as those described above may be 
grown in cell culture, under conditions well known to those 
of skill in the art- The PKD1 gene product may then be 
purified from the cell media using any of the techniques 

20 discussed above. 

5.2.2. PKD1 PROTEIN SYNTHESIS AND EXPRESSION METHODS 

Methods for the chemical synthesis of polypeptides 
( e.g. , gene products) or fragments .thereof, are well-known to 

25 those of ordinary skill in the art, e.g. . peptides can be 

synthesized by solid phase techniques, cleaved from the resin 
and purified by preparative high performance liquid 
chromatography (see, e.g. , Creighton, 1983, Proteins: 
Structures and Molecular Principles, WLH. Freeman & Co-, 

30 N.Y., pp. 50-60)- The composition of the synthetic peptides 
may be confirmed by amino acid analysis or sequencing; e.g. , 
using the Edman degradation procedure (see e.g. , Creighton, 
1983, supra at pp. 34-49) . Thus, the PKD1 protein may be 
chemically synthesized in whole or in part. 

35 The PKD1 protein may additionally be produced by 

recombinant DNA technology using the PKD1 nucleotide 
sequences as described, above, in Section 5.1, coupled with 
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techniques well known in the art. Thus, methods for 
preparing the PKD1 polypeptides and peptides of the invention 
by expressing nucleic acid encoding PKD1 sequences are 
described herein. Methods which are well known to those 
5 skilled in the art can be used to construct expression 
vectors containing PKD1 protein coding sequences and 
appropriate transcriptional/translational control signals , 
These methods include, for example, in vitro recombinant DNA 
techniques, synthetic techniques and in vivo 

10 recombination/genetic recombination. See, for example, the 
techniques described in Maniatis et al . , 1989, Molecular 
Cloning A Laboratory Manual, Cold Spring Harbor Laboratory, 
N,Y. and Ausubel et al,, 1989, Current Protocols in Molecular 
Biology, Greene Publishing Associates and Wiley Interscience , 

15 N.Y., both of which are incorporated by reference herein in 
their entirety. Alternatively, RNA capable of encoding PKD1 
protein sequences may be chemically synthesized using, for 
example, automated or semi -automated synthesizers. See, for 
example, the techniques described in "Oligonucleotide 

20 Synthesis", 1984, Gait, M V J. ed. , IRL Press, Oxford, which is 
incorporated by reference herein in its* entirety. 

A variety of host-expression vector systems may be 
utilized to express the PKD1 coding sequences of the 
invention. Such host -expression systems represent vehicles 

25 by which the coding sequences of interest may be produced and 
subsequently purified, but also represent cells which may, 
when transformed or transfected with the appropriate 
nucleotide coding sequences, exhibit the PKD1 protein of the 
invention in situ . These include but are not limited to 

30 microorganisms such as bacteria ( e.g. , E, coli . B. subtilis ) 
transformed with recombinant bacteriophage DNA, plasmid DNA 
or cosmid DNA expression vectors containing PKD1 protein 
coding sequences; yeast ( e «g. , Saccharomyces . Pichia ) 
transformed with recombinant yeast expression vectors 

35 containing* the PKD1 protein coding sequences; insect cell 
systems infected with recombinant virus expression vectors 
( e.g . , baculovirus) containing the PKD1 protein coding 
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sequences; plant cell systems infected with recombinant virus 
expression vectors ( e.g. , cauliflower mosaic virus, CaMV; 
tobacco mosaic virus, TMV) or transformed with recombinant 
plasmid expression vectors ( e .a, , Tigplasmid) containing the 
5 PKD1 protein coding sequences coding sequence/ or mammalian 
cell systems ( e.g. , COS, CHO, BHK, 2 93, 3T3) harboring 
recombinant expression constructs containing promoters 
derived from the genome of mammalian cells ( e . g, , 
metallothionein promoter) or from mammalian viruses ( e.g. , 
10 the adenovirus late promoter; the vaccinia virus 7 V 5K 
promoter) . 

In bacterial systems, a number of expression vectors may 
be advantageously selected depending upon the use intended 
for the PKD1 protein being expressed. For example, when a 

15 large quantity of such a protein is to be. produced, for the 
generation of antibodies or to screen peptide libraries, for 
example, vectors which direct the expression of high levels 
of fusion protein products that are readily purified may be 
desirable. Such vectors include, but are not limited to, the 

20 E. coli expression vector pUR278 (Ruther et al . , 1983, EMBO 
J, 2:1791), in which the PKD1 protein coding sequence may be 
ligated individually into the vector in frame with the lac Z 
coding region so that a fusion protein is produced; pIN 
vectors (Inouye & Inouye, 1985, Nucleic Acids Res 12: 3101- 

25 3109; Van Heeke & Schuster, 1989, J. Biol. Chem. 264 : 5503- 
5509) ; and the like* pGEX vectors may also be used to 
express foreign polypeptides as fusion proteins with gluta- 
thione S- transferase (GST) . In general, such fusion proteins 
are soluble and can easily be purified from lysed cells by 

30 adsorption to 1 glutathione-agarose beads followed by elution 
in the presence of free glutathione. The pGEX vectors are 
designed to include thrombin or factor Xa protease cleavage 
sites so that the cloned PKD1 protein can be released from 
the GST moiety, 

35 In an insect system, Autoarapha californica nuclear 

polyhedrosis virus (AcNPV) is used as a vector to express 
foreign genes. The virus grows in Spodoptera frucriperda 
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natural and synthetic. The efficiency of expression may be 
enhanced by the inclusion of appropriate transcription 
enhancer elements, transcription terminators, etc. (see 
Bittner et al . , 1987, Methods in Ensrymol . 153:516-544). 
5 In addition, a host cell strain may be chosen which 

modulates the expression of the inserted sequences, or 
modifies and processes the gene product in the specific 
fashion desired. Such modifications ( e.g. , glycosylation) 
and processing ( e.g. , cleavage) of protein products may be 

10 important for the function of the protein. Different host 
cells have characteristic and specific mechanisms for the 
post- translational processing and modification of proteins. 
Appropriate cells lines or host systems can be chosen to 
ensure the correct modification and processing of the foreign 

15 protein expressed. To this end, eukaryotic host cells which 
possess the cellular machinery for proper processing of the 
primary transcript, glycosylation, and phosphorylation of the 
gene product may be used- Such mammalian host cells include, 
but are not limited to, CHO, VERO, BHK, HeLa, COS, MDCK, 293, 

20 3T3, WI38, etc. 

For long-term, high-yield production of recombinant 
proteins, stable expression is preferred. For example, cell 
lines which stably express the PKD1 protein may be 
engineered* Rather than using expression vectors which 

25 contain viral origins of replication, host cells'can be 
transformed with DNA controlled by appropriate expression 
control elements ( e.g. , promoter, enhancer, sequences, 
transcription terminators, polyadenylation sites, etc.), and 
a selectable marker. Following the introduction of the 

3 0 foreign DNA, engineered cells may be allowed to grow for 1-2 
days in an enriched media, and then are switched to a 
selective media. The selectable marker in the recombinant 
plasmid confers resistance to the selection and allows cells 
to stably integrate the plasmid into their chromosomes and 

35 grow to form foci which in turn can be cloned and expanded 
into cell lines. This method may advantageously be used to 
engineer cell lines which express the PKD1 protein. Such 
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engineered cell lines may be particularly useful in screening 
and evaluation of compounds that affect the endogenous 
activity of the PKD1 protein. 

A number of selection systems may be used, including but 
5 not limited to the herpes simplex virus thymidine kinase 
(Wigler, et al . , 1977, Cell 11:223), hypoxanthine -guanine 
phosphoribosyltransf erase (Szybalska & Szybalski, 1962, Proc, 
Natl. Acad. Sci. USA 48. : 2026 ) , and adenine 

phosphoribosyltransf erase {Lowy, et al . , 1980, Cell 22 : 817) 

10 genes can be employed in tk", hgprt' or aprt" cells, 

respectively. Also, antimetabolite resistance can be used as 
the.rbasis of selection for dhfr> which confers resistance to 
methotrexate (Wigler, et al . , 1980, Natl. Acad. Sci. USA 
77:3567; 0 ' Hare , et al . , 1981, Proc. Natl. Acad. Sci, USA 

15 78. : 1527) ; - gpt , which confers resistance to mycophenolic acid 
(Mulligan k Berg, 1981, Proc. Natl. Acad. Sci. USA 78:2072); 
neo, which confers resistance to the aminoglycoside G-418 
(Colberre-Garapin, et al., 1981, J. Mol. Biol. 150:1); and 
hygro, which confers resistance to hygromycin (Santerre, et 

20 al., 1984, Gene 30:147) genes. 

Whether produced by molecular cloning methods or by 
chemical synthetic methods, the amino acid sequence of the 
PKD1 protein which may be used in the assays of the invention 
need not be identical to the amino acid sequence encoded by 

25 the : PKD1 gene reported herein. The PKD1 proteins or peptides 
used may comprise altered sequences in which amino acid 
residues are deleted, added, or substituted, while still 
resulting in a gene product functionally equivalent to the 
PKD1 gene product. "Functionally equivalent", as utilized 

30 herein, is as defined, above, in Section 5.1, and is 
additionally defined to refer to peptides capable of 
interacting with other cellular or extracellular molecules in 
a manner substantially similar to the way in which the 
corresponding portion of the endogenous PKD1 gene product 

35 would. 

For example, functionally equivalent amino acid residues 
may be substituted for residues within the sequence resulting 
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in a change of amino acid sequence. Such substitutes may be 
selected from other members of the class ( i.e. , non-polar, 
positively charged or negatively charged) to which the amino 
acid belongs; e.g. , the nonpolar (hydrophobic) amino acids 
5 include alanine, leucine, isoleucine, valine, proline, 

phenylalanine, tryptophan, and methionine; the polar neutral 
amino acids include glycine, serine, threonine, cysteine, 
tyrosine, asparagine, and glutamine; the positively charged 
(basic) amino acids include arginine, lysine, and histidine; 

10 the negatively charged (acidic) amino acids include aspartic 
and glutamic acid. 

When used as a component in the assay systems described 
herein, the PKD1 gene product or peptide ( e.g. , gene product 
fragment) may be labeled, either directly or indirectly, to 

15 facilitate detection of a complex formed. between the PKD1 
gene product and a test substance. Any of a variety of 
suitable labeling systems may be used including but not 
limited to radioisotopes such as 125 I; enzyme labelling 
systems that generate a detectable colorimetric signal or 

20 light when exposed to substrate; and fluorescent labels. 

Where recombinant DNA technology is used to produce the 
PKD1 protein for the assay systems described herein, it may 
be advantageous to engineer fusion proteins that can 
facilitate labeling, immobilization • and/or detection. For 

25 example, the coding sequence of the viral or host cell 

protein can be fused to that of a heterologous protein that 
has enzyme activity or serves as an enzyme substrate in order 
to facilitate labeling and detection. The fusion constructs 
should be designed so that the heterologous component of the 

30 fusion product does not interfere with binding of the host 
cell and viral protein. 

Indirect labeling involves the use of a third protein, 
such as a labeled antibody, which specifically binds to one 
of the binding partners, i.e. , either the PKD1 protein or its 

35 binding partner used in the assay. Such antibodies include 
but are not limited to polyclonal, monoclonal, chimeric, 
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single chain, Fab fragments and fragments produced by an Fab 
expression library. 

5.3. ANTIBODIES REACTIVE WITH PKD1 GENE PRODUCT 
5 Described herein are methods for the production of 

antibodies capable of specifically recognizing one or more 
PKD1 gene product epitopes. Such antibodies may include, but 
are not limited to polyclonal antibodies, monoclonal 
antibodies (mAbs) , humanized or chimeric antibodies, single 

10 chain antibodies, Fab fragments, F (ab' ) ^ ' fragments , fragments 
produced by a FAb expression library, ant i - idiotypic (anti- 
Id), ...-antibodies , and epitope-binding fragments of any of the 
above. Such antibodies may be used, for example, in the 
detection of PKD1 gene product in a biological sample, or, 

15 alternatively, as a method for the inhibition of abnormal 

PKD1 activity. Thus, such antibodies may be utilized as part 
of ADPKD treatment methods, and/or may be used as part of 
diagnostic techniques whereby patients may be tested for 
abnormal levels of PKD1 gene product, or for the presence of 

2 0 abnormal forms of the PKD1 protein. 

For the production of antibodies to PKD1 , various host 
animals may be immunized by injection with PKD1 protein, or a 
portion thereof. Such host animals may include but are not 
limited to, rabbits, mice, and rats,. Various adjuvants may 

25 be used to increase the immunological response, depending on 
the . host species, including, but not limited to, Freund's 
(complete and incomplete) , mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, 
pluronic polyols, polyanions, peptides, oil emulsions, 

30 keyhole limpet hemocyanin, dinitrophenol, and potentially 
useful human adjuvants such as BCG (bacille Calmette-Guerin) 
and Corvnebacteriumparvum . 

Polyclonal antibodies are heterogeneous populations of 
antibody molecules derived from the sera of animals immunized 

35 with an antigen, such as PKD1, or an antigenic functional 
derivative thereof* For the production of polyclonal 
antibodies, host animals such as those described above, may 
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be immunized by injection with PKDl protein supplemented with 
adjuvants as also described above. 

Monoclonal antibodies which are substantially 
homogeneous populations of antibodies to a particular 
5 antigen, may be obtained by any technique which provides for 
the production of antibody molecules by continuous cell lines 
in culture. These include, but are not limited to, the 
hybridoma technique of Kohler and Milstein (1975 r Nature 
Z56 :495-497; and U.S. Patent No. 4,376,110} , the human B-cell 

10 hybridoma technique {Kosbor et al . , 1983, Immunology Today 
4:72; Cole et al . , 1983, Proc . Natl. Acad. Sci . USA 80 :2Q26- 
2030), and the EBV-hybridoma technique (Cole et ai. . 1985, 
Monoclonal Antibodies And Cancer Therapy, Alan R. Liss, Inc., 
pp. 77-96). Such antibodies may be of any immunoglobulin 

15 class, including IgG, IgM, IgE, IgA, IgD and any subclass 
thereof. The hybridoma producing the mAb of this invention 
may be cultivated in vitro or in vivo . Production of high 
titers of mAbs in vivo makes this the presently preferred 
method of production* 

20 In addition, techniques developed for the production of 

"chimeric antibodies" (Morrison et al . , 1984, Proc. Natl. 
Acad. Sci., 81:6851-6855; Neuberger et al . , 1984, Nature, 
311:604-608; Takeda et al., 1985, Nature, 314:452-454; U.S. 
Patent No. 4,816,567, which is incorported by reference 

25 herein in its entirety) by splicing the genes from a mouse 
antibody molecule of appropriate antigen specificity together 
with genes from a human antibody molecule of appropriate 
biological activity can be used. A chimeric antibody is a 
molecule in which different portions are derived from 

30 different animal species, such as those having a murine 

variable region and a human immunoglobulin constant region. 

Alternatively, techniques described for the production 
of single chain antibodies (U.S. Patent 4,946,778; Bird, 
1988, Science 242:423*426; Huston et al., 1988, Proc. Natl. 

35 Acad. Sci. USA 8^:5879-5883; and Ward et al., 1989, Nature 
214:544-546) can be adapted to produce PKDl-single chain 
antibodies. Single chain antibodies are formed by linking 
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the heavy and light chain fragment of the Fv region via an 
amino acid bridge, resulting in a single chain polypeptide. 

Further, PKD1- humanized monoclonal antibodies may be 
produced using standard techniques (see, for example, U.S. 
5 Patent No. 5,225,53 9, which is incorporated herein by 
reference in its entirety) . 

Antibody fragments which recognize specific epitopes may 
be generated by known techniques. For example, such 
fragments include but are not limited to: the F(ab') 2 

10 fragments . which can be produced by pep.sin digestion of the 
antibody molecule and the Fab fragments which can be 
generated by reducing the disulfide bridges of the F<ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al . , 1989, Science, 246.: 1275- 1281 ) to 

15 allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

5.4. SCREENING ASSAYS FOR COMPOUNDS 

THAT INTERACT WITH THE PKD1 GENE PRODUCT 

20 The following assays are designed to identify compounds 

that bind to the PKD1 gene product; other cellular proteins 
that interact with the PKD1 gene product; and compounds that 
interfere with the interaction of the PKD1 product with other 
cellular proteins. 

25 :z: Compounds identified via assays such as those described 
herein may be useful, for example, in elaborating the 
biological function of the PKD1 gene product, and for 
ameliorating ADPKD symptoms caused by mutations within the 
PKD1 gene. In instances whereby a mutation with the PKD1 

3Q gene causes a lower level of expression, and therefore 

results in an overall lower level of PKDl activity in a cell 
or tissue, compounds that interact with the PKDl gene product 
may include ones which accentuate or amplify the activity of 
the bound PKDl protein- Thus, such compounds would bring 

35 about an effective increase in the level of PKDl activity, 
thus ameliorating ADPKD symptoms • In instances whereby 
mutations with the PKDl gene cause aberrant PKDl proteins to 
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be made which have a deleterious effect that leads to ADPKD, 
compounds that bind PKD1 protein may be identified that 
inhibit the activity of the bound PKD1 protein. 

This decrease in the aberrant PKD1 activity can 
5 therefore, serve to ameliorate ADPKD symptoms. Assays for 
testing the effectiveness of compounds, identified by, for 
example, techniques such as those described in this Section 
are aiscussed, below, in Section 5.3. 

10 5.5. IN VITRO SCREENING ASSAYS FOR 

COMPOUNDS THAT BIND TO THE PKD1 PROTEIN 

In vitro systems may be designed to identify ^compounds 

capable of binding the PKD1 gene of the invention. Such 

compounds may include, but are not limited to, peptides made 

15 of D-and/or L-conf iguration amino acids (in, for example, the 
form of random peptide libraries; see Lam, K*S. et aJL. , 1991, 
Nature 354:82-84), phosphopeptides (in, for example, the form 
of random or partially degenerate, directed phosphopeptide 
libraries; see, for example, Songyang, Z. et al . , 1993, Cell 

2o 22:767-778), antibodies, and small or large organic or 

inorganic molecules. Compounds identified may be useful, for 
example, in modulating the activity of PKD1 proteins, 
preferably mutant PKD1 proteins, may be useful in elaborating 
the biological function of the PKD1 protein, may be utilized 

2 g in screens for identifying compounds that disrupt normal PKD1 
interactions, or may in themselves disrupt such interactions. 

The principle of the assays used to identify compounds 
that bind to the PKD1 protein involves preparing a reaction 
mixture of the PKD1 protein and the test compound under 

30 conditions and for a time sufficient to allow the two 

components to interact and bind, thus forming a complex which 
can be removed and/or detected in the reaction mixture. 
These assays can be conducted in a heterogeneous or 
homogeneous format- Heterogeneous assays involve anchoring 

35 PKD1 or the test substance onto a solid phase and detecting 
PKDl/test substance complexes anchored on the solid phase at 
the end of the reaction. In homogeneous assays, the entire 
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reaction is carried out in a liquid phase. In either 
approach, the order of addition of reactants can be varied to 
obtain different information about the compounds being 
tested. 

5 In a heterogeneous assay system, the PKD1 protein may be 

anchored onto a solid surface, and the test substance, which 
is not anchored, is labeled, either directly or indirectly. 
In practice, microtiter plates are conveniently utilized. 
The anchored component may be immobilized by non-covalent or 

10 coyalent attachments. Non-covalent attachment may be 
accomplished simply by coating the solid surface with a 
sol-ution of the protein and drying. Alternatively, an 
immobilized antibody, preferably a monoclonal antibody, 
specific for the protein may be used to anchor the protein to 

15 the solid surface. The surfaces may be prepared in advance 
and stored. 

In order to conduct the assay, the labeled component is 
added to the coated surface containing the anchored 
component. After the reaction is complete, unreacted 

20 components are removed ( e.g. , by washing) under conditions 
such that any complexes formed will remain immobilized on the 
solid surface. The detection of complexes anchored on the 
solid surface can be accomplished in a number of ways. Where 
the- labeled compound is pre-labeled,. the detection of label 

25 immobilized on the surface indicates that complexes were 
formed. Where the labeled component is not pre-labeled, an 
indirect label can be used to detect complexes anchored on 
the surface; e *g. , using a labeled antibody specific for the 
binding partner (the antibody, in turn, may be directly 

30 labeled or indirectly labeled with a labeled anti-Ig 
antibody) . 

Alternatively, a heterogenous reaction can be conducted 
in a liquid phase, the reaction products separated from 
unreacted components, and complexes detected; e.g. , using an 
35 immobilized antibody specific for PKD1 or the test substance 
to anchor any complexes formed in solution, and a labeled 
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antibody specific for the other binding partner to detect 
anchored complexes. 

In an alternate embodiment of the invention, a 
homogeneous assay can be used. In this approach, a preformed 
5 complex of the PKD1 protein and a known binding partner is 
prepared in which one of the components is labeled, but the 
signal generated by the label is quenched due to complex 
formation (see, e .q . , U.S. Patent No. 4,109,496 by Rubenstein 
which utilizes this approach for immunoassays) . The addition 
10 of a test substance that competes with and displaces one of 
the binding partners from the preformed complex -will result 
in the generation of a signal above background. - 

5.6. ASSAYS FOR CELLULAR PROTEINS 

15 THAT INTERACT WITH PKD1 PROTEIN 

Any method suitable for detecting protein-protein 

interactions may be employed for identifying novel PKD1- 

cellular or extracellular protein interactions* For example, 

some traditional methods which may be employed are 

20 co- immunoprecipitation, crosslinking and copurif ication 

through gradients or chromatographic columns. Additionally, 
methods which result in the simultaneous identification of 
the genes coding for the protein interacting with a target 
protein may be employed. These methods include , for example, 

2 5 probing expression libraries with labeled target protein, 

using this protein in a manner similar to antibody probing of 
Xgtll libraries. 

One such method which detects protein interactions in 
vivo , the yeast two-hybrid system, is described in detail for 

3Q illustration only and not by way of limitation. One version 
of this system has been described (Chien et al ♦ , 1991, Proc. 
Natl* Acad. Sci- USA, ££: 9578-9582) and is commercially 
available from Clontech (Palo Alto, CA) . 

Briefly, utilizing such a system, plasmids are 

3 g constructed that encode two hybrid proteins: one consists of 
the DNA-binding domain of a transcription activator protein 
fused to one test protein "X" and the other consists of the 
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driven by a promoter which contains GAL4 activation 
sequences. A cDNA encoded protein, fused to GAL4 activation 
domain, that interacts with PKD1 will reconstitute an active 
GAL 4 protein and thereby drive expression of the lac Z gene. 
5 Colonies which express lacZ can be detected by their blue 
color in the presence of X-gal. The cDNA can then be 
extracted from strains derived from these and used to produce 
and isolate the PKD1- interacting protein using techniques 
routinely practiced in the art. 



10 



5.7. ASSAYS FOR COMPOUNDS THAT INTERFERE 

WITH PKD1 /CELLULAR PROTEIN INTERACTION 



The PKD1 protein of the invention may, in vivo , interact 
with one or more cellular or extracellular proteins. Such 

15 cellular proteins are referred to herein as "binding 

partners". Compounds that disrupt such interactions may be 
useful in regulating the activity of the PKD1 protein, 
especially mutant PKD1 proteins. Such compounds may include, 
but are not limited to molecules such as antibodies, 

2Q peptides, and the like described in Section 5,2.1. above. 

In instances whereby ADPKD symptoms are caused by a 
mutation within the PKD1 gene which produces PKD1 gene 
products having aberrant, gain-of -function activity, 
compounds identified that disrupt such interactions may, 

25 therefore inhibit the aberrant PKD1 activity. Preferably, 
compounds may be identified which disrupt the interaction of 
mutant PKD1 gene products with cellular or extracellular 
proteins, but do not substantially effect the interactions of 
the normal PKD1 protein. Such compounds may be identified by 

3Q comparing the effectiveness of a compound to disrupt 

interactions in an assay containing normal PKD1 protein to 
that of an assay containing mutant PKD1 protein. 

The basic principle of the assay systems used to 
identify compounds that interfere with the interaction 

35 between the PKD1 protein, preferably mutant PKD1 protein, and 
its cellular or extracellular protein binding partner or 
partners involves preparing a reaction mixture containing the 
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PKD1 protein and the binding partner under conditions and for 
a time sufficient to allow the two proteins to interact and 
bind, thus forming a complex. In order to test a compound 
for inhibitory activity, the reaction is conducted in the 
5 presence and absence of the test compound, i.e. , the test 
compound may be initially included in the reaction mixture, 
or added at a time subsequent to the addition of PKD1 and its 
cellular or extracellular binding partner; controls are 
incubated without the test compound or with a placebo. The 

10 formation of any complexes between the -PKD1 protein and the 
cellular or extracellular binding partner is then detected, 
The^f ormation of a complex in the control reaction, but not 
in the reaction mixture containing the test compound 
indicates that the compound interferes with the interaction 

15 of the PKD1 protein and the interactive protein. As noted 
above, complex formation within reaction mixtures containing 
the test compound and normal PKD1 protein may also be 
compared to complex formation within reaction mixtures 
containing the test compound and mutant PKD1 protein. This 

20 comparison may be important in those cases wherein it is 

desirable to identify compounds that disrupt interactions of 
mutant but not normal PKD1 proteins. 

The assay for compounds that interfere with the 
interaction of the binding partners can be conducted in a 

25 heterogeneous or homogeneous format. Heterogeneous assays 
involve anchoring one of the binding partners onto a solid 
phase and detecting complexes anchored on the solid phase at 
the end of the reaction. In homogeneous assays, the entire 
reaction is carried out in a liquid phase. In either 

30 approach, the order of addition of reactants can be varied to 
obtain different information about the compounds being 
tested* For example, test compounds that interfere with the 
interaction between the binding partners, e.g. , by 
competition, can be identified by conducting the reaction in 

35 the presence of the test substance; i,e . , by adding the test 
substance to the reaction mixture prior to or simultaneously 
with the PKD1 protein and interactive cellular or 
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extracellular protein. On the other hand, test compounds 
that disrupt preformed complexes, e . g. compounds with higher 
binding constants that displace one of the binding partners 
from the complex, can be tested by adding the test compound 
5 to the reaction mixture after complexes have been formed. 
The various formats are described briefly below. 

In a heterogeneous assay system, one binding partner, 
e.g. , either the PKD1 protein or the interactive cellular or 
extracellular protein, is anchored onto a solid surface, and 

10 its binding partner, which is not anchored, is labeled, 
either directly or indirectly. In practice, microtiter 
plates are conveniently utilized. The anchored species may 
be immobilized by non-covalent or covalent attachments. Non- 
covaient attachment may be accomplished simply by coating the 

15 solid surface with a solution of the protein and drying. 
Alternatively, an immobilized antibody specific for the 
protein may be used to anchor the protein to the solid 
surface. The surfaces may be prepared in advance and stored. 
In order to conduct the assay, the binding partner of 

20 the immobilized species is added to the coated surface with 
or without the test compound* After the reaction is 
complete, unreacted components are removed ( e.g. , by washing) 
and any complexes formed will remain immobilized on the solid 
surface. The detection of complexes anchored on the solid 

25 surface can be accomplished in a number of ways.— Where the 
binding partner was pre -labeled, the detection of label 
immobilized on the surface indicates that complexes were 
formed- Where the binding partner is not pre -labeled, an 
indirect label can be used to detect complexes anchored on 

30 the surface; e.g. , using a labeled antibody specific for the 
binding partner (the antibody, in turn, may be directly 
labeled or indirectly labeled with a labeled anti-Ig 
antibody) . Depending upon the order of addition of reaction 
components, test compounds which inhibit complex formation or 

35 which disrupt preformed complexes can be detected. 

Alternatively, the reaction can be conducted in a liquid 
phase in the presence or absence of the test compound, the 
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reaction products separated from unreacted components, and 
complexes detected; e.g. , using an immobilized antibody 
specific for one binding partner to anchor any complexes 
formed in solution, and a labeled antibody specific for the 
5 other binding partner to detect anchored complexes. Again, 
depending upon the order of addition of reactants to the 
liquid phase, test compounds which inhibit complex or which 
disrupt preformed complexes can be - identified . 

In an alternate embodiment of the invention, a 

10 homogeneous assay can be used. In this approach, a preformed 
complex of the PKD1 protein and the interactive cellular or 
extracellular protein is prepared in which one of the binding 
partners is labeled, but the signal generated by the label is 
quenched due to complex formation (see, e.g. , U.S. Patent 

15 No. 4,109,496 by Rubenstein which utilizes this approach for 
immunoassays) . The addition of a test substance that 
competes with and displaces one of the binding partners from 
the preformed complex will result in the generation of a 
signal above background. In this way, test substances which 

20 disrupt PKD1 protein-cellular or extracellular protein 
interaction can be identified. 

In a particular embodiment, the PKD1 protein can be 
prepared for immobilization using recombinant DNA techniques 
described in Section 5.1*2.2, supra. For example, the PKD1 

25 coding region can be fused to the glutathione-S-transf erase 
(GST) gene using the fusion vector pGEX-SX-l, in such a 
manner that its binding activity is maintained in the 
resulting fusion protein. The interactive cellular or 
extracellular protein can be purified and used to raise a 

30 monoclonal antibody, using methods routinely practiced in the 
art and described above* This antibody can be labeled with 
the radioactive isotope 125 I, for example, by methods 
routinely practiced in the art. In a heterogeneous assay, 
e.g. , the GST-PKD1 fusion protein can be anchored to 

35 glutathione-agarose beads. The interactive cellular or 
extracellular protein can then be added in the presence or 
absence of the test compound in a manner that allows 
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interaction and binding to occur. At the end of the reaction 
period, unbound material can be washed away, and the labeled 
monoclonal antibody can be added to the system and allowed to 
bind to the complexed binding partners. The interaction 
5 between the PKD1 protein and the interactive cellular or 

extracellular protein can be detected by measuring the amount 
of radioactivity that remains associated with the 
glutathione-agarose beads. A successful inhibition of the 
interaction by the test compound will result in a decrease in 

10 measured radioactivity. 

Alternatively, the GST-PKD1 fusion protein and the 
interactive cellular or extracellular protein can Be mixed 
together in liquid in the absence of the solid glutathione- 
agarose beads. The test compound can be added either during 

15 or after the binding partners are allowed to interact. This 
mixture can then be added to the glutathione -agarose beads 
and unbound material is washed away. Again the extent of 
inhibition of the binding partner interaction can be detected 
by adding the labeled antibody and measuring the 

20 radioactivity associated with the beads. 

In another embodiment of the invention, these same 
techniques can be employed using peptide fragments that 
correspond to the binding domains of the PKD1 protein and the 
interactive cellular or extracellular protein, respectively, 

25 in place of one or both of the full length proteins , Any 
number of methods routinely practiced in the art can be used 
to identify and isolate the protein's binding site. These 
methods include, but are not limited to, mutagenesis of one 
of the genes encoding the proteins and screening for 

30 disruption of ; binding in a co-immunoprecipitation assay. 
Compensating mutations in the PKD1 gene can be selected. 
Sequence analysis of the genes encoding the respective 
proteins will reveal the mutations that correspond to the 
region of the protein involved in interactive binding. 

35 Alternatively, one protein can be anchored to a solid surface 
using methods described in this Section above, and allowed to 
interact with and bind to its labeled binding partner, which 
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has been treated with a proteolytic enzyme, such as trypsin. 
After washing, a short, labeled peptide comprising the 
binding domain may remain associated with the solid material, 
which can be isolated and identified by amino acid 
5 sequencing. Also, once the gene coding for the for the 
cellular or extracellular protein is obtained, short gene 
segments can be engineered to express peptide fragments of 
the protein, which can then be tested for binding activity 
and purified or synthesized. 

10 , Ror example, and not by way of limitation, PKD1 can be 

anchored to a solid material as described above in this 
section by making a GST-PKD1 fusion protein and allowing it 
to bind to glutathione agarose beads. The interactive 
cellular protein can be labeled with a radioactive isotope, 

15 such as 35 S, and cleaved with a proteolytic enzyme such as 

trypsin. Cleavage products can then be added to the anchored 
GST-PKD1 fusion protein and allowed to bind. After washing 
away unbound peptides, labeled bound material, representing 
the cellular or extracellular protein binding domain, can be 

20 eluted, purified, and analyzed for amino acid sequence by 
methods described in Section 5.1.2.2, supra. Peptides so 
identified can be produced synthetically or fused to 
appropriate facilitative proteins using recombinant DNA 
technology, as described in Section 5.1.2.2, supra. 

25 

, 5.8. ASSAYS FOR ADKPD- INHIBITORY ACTIVITY 
Any of the binding compounds, including but not limited 
to, compounds such as those identified in the foregoing assay 
systems may be tested for anti-ADPKD activity. ADPKD, an 

30 autosomal dominant disorder , may involve underexpression of a 
wild- type PKD1 allele, or expression of a PKD1 gene product 
that exhibits little or no PKD1 activity. In such an 
instance, even though the PKD1 gene product is present, the 
overall level of normal PKD1 gene product present is 

35 insufficient and leads to ADPKD symptoms. As such, "anti- 
ADPKD activity", as used herein, may refer to a increase in 
the level of expression of the normal PKD1 gene product, to 
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levels wherein ADPKD symptoms are ameliorated- Additionally, 
the term may refer to an increase in the level of normal PKD1 
activity in the cell, to levels wherein ADPKD symptoms are 
ameliorated . 

5 Alternatively, ADPKD may be caused by the production of 

an aberrant mutant form of the PKD1 protein, which either 
interferes with the normal allele product or introduces a 
novel function into the cell, which then leads to the mutant 
phenotype. For example, a mutant PKD1 protein may compete 

10 with the wild type protein for the binding of a substance 
required to relay a signal inside or outside of a : cell . 
Circumstances such as these are referred to as "gain of 
function" mutations. It is possible that different 
mechanisms could be occurring in different patients which can 

15 lead to mutant phenotypic variations. 

"Anti -ADPKD activity", as used herein, may refer to a 
decrease in the level and/or activity of such a mutant PKD1 
protein so that symptoms of PKD1 are ameliorated. 

Cell -based and animal model -based assays for the 

20 identification of compounds exhibiting anti-ADPKD activity 
are described below. 

5.8.1. CELL BASED ASSAYS 
Cells that contain and express mutant PKD1 gene 

25 sequences which encode mutant PKD1 protein, and thus exhibit 
cellular phenotypes associated with ADPKD, may be utilized to 
identify compounds that possess anti-ADPKD activity. Such 
cells may include cell lines consisting of naturally 
occurring or engineered cells which express mutant or express 

30 both normal &nd mutant PKD1 gene products. Such cells 
include, but are not limited to renal epithelial cells, 
including primary and immortalized human renal tubular cells, 
MDCK cells, LLPCK1 cells, and human renal carcinoma cells. 
Cells, such as those described above, which exhibit 

35 ADPKD-like cellular phenotypes, may be exposed to a compound 
suspected of exhibiting anti-ADPKD activity at a sufficient 
concentration and for a time sufficient to elicit such anti- 
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ADPKD1 activity in the exposed cells. After exposure, the 
cells are examined to determine whether one or more of the 
ADPKD- like cellular phenotypes has been altered to resemble a 
more wild type, non- ADPKD phenotype. 
5 Among the cellular phenotypes which may be followed in 

the above assays are differences in the apical/basolateral 
distribution of membrane proteins. For example, normal 
( i.e. , non-ADPKD) renal tubular cells in situ and in culture 
under defined conditions have a characteristic pattern of 

10 apical/basolateral distribution of cell surface markers. 
ADPKD renal cells, by contrast, exhibit a distribution 
pattern that reflects a partially reversed apical/basolateral 
polarity relative to the normal distribution. For example, 
sodium-potassium ATPase is found on the basolateral membranes 

15 of renal epithelial cells but is found on the apical surface 
of ADPKD epithelial cells, both in cystic epithelia in vivo 
and in ADPKD cells in culture (Wilson, et al . , 1991, Am. J. 
Physiol. 260 :F420-F430) . Among the other markers which 
exhibit an alteration in polarity in normal versus ADPKD 

20 affected cells are the EGF receptor, which is normally 

located basolaterally, but in ADPKD cells is mislocated to 
the apical surface. Such a apical/basolateral marker 
distribution phenotype may be followed, for example, by 
standard immunohistology techniques using antibodies specific 

25 to the marker (s) of interest in conjunction with procedures 
that_are well known to those of skill in the art. 

. Additionally, assays for t£e function of the PKD1 gene 
product can, for example, include a measure of extracellular 
matrix (ECM) components, such as proteoglycans, laminin, 

30 fibronectin and the like, in that studies in both ADPKD and 
in rat models of acquired cystic disease (Carone, F.A. et 
al., 1989, Kidney International 35:1034-1040) have shown 
alterations in such components. Thus, any compound which 
serves to create an extracellular matrix environment which 

35 more fully mimics the normal ECM should be considered as a 
candidate for testing for an ability to ameliorate ADPKD 
symptoms . 
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5.8.2 ANIMAL MODEL ASSAYS 
The ability of a compound, such as those identified in 
the foregoing binding assays, to prevent or inhibit disease 
may be assessed in animal models for^ ADPKD. Several 
5 naturally-occurring mutations for renal cystic disease have 
been found in animals. While these are not perfect models of 
ADPKD, they provide test systems for assaying the effects of 
compounds that interact with PKD1 -proteins. Of these models, 
the Han : SPRD rat model is the only autosomal dominant 
10 example. Such a model is well known to "those of skill in the 
art. See, for example, Kaspareit -Ritt inghausen et 'al . , 1989, 
Vet'. Path. 26:195. In addition, several recessive -models 
exist (Reeders, S., 1992, Nature Genetics 1:235). 

Additionally, animal models exhibiting ADPKD- 1 ike 
15 symptoms may be engineered by utilizing PKD1 sequences such 
as those described, above, in Section 5.1, in conjunction 
with techniques for producing transgenic animals that are 
well known to those of skill in the art. 

Animals of any species, including, but not limited to, 
20 mice, rats, rabbits, guinea pigs, pigs, micro-pigs, goats, 
and non-human primates, e.g. , baboons, squirrels, monkeys, 
and chimpanzees may be used to generate such ADPKD animal 
models. 

In instances wherein the PKD1 mutation leading to ADPKD 
25 symptoms causes a drop in the level of PKD1 protein or causes 
an ineffective PKD1 protein to be made ( i.e. , the PKD1 
mutation is a dominant loss-of -function mutation) various 
strategies may be utilized to generate animal models 
exhibiting ADPKD-like symptoms. For example, PKD1 knockout 
30 animals, such as mice, may be generated and used to screen 
for compounds which exhibit an ability to ameliorate ADPKD 
systems. Animals may be generated whose cells contain one 
inactivated copy of a PKD1 - homologue • In such a strategy, 
human PKD1 gene sequences may be used to identify a PKD1 
35 homologue within the animal of interest, utilizing techniques 
described, above, in Section 5.1. Once such a PKD1 homologue 
has been identified, well-known techniques such as those 



- 41 - 
SUBSTITUTE SHEET (RULE 26) 



WO 95/34573 



wmmmm 



described, below, in Section 5.8.2.1. may be utilized to 
disrupt and inactivate the endogenous PKD1 homolog, and 
further, to produce animals which are heterozygous for such 
an inactivated PKD1 homolog. Such animals may then be 
5 observed for the development of ADPKD- like symptoms. 

In instances wherein a PKD1 mutation causes a PKDi 
protein having an aberrant PKDI activity which leads to ADPKD 
symptoms ( i.e. , the PKDI mutation is a dominant gain-of- 
function mutation) strategies such as those now described may 

10 be^utilized to generate ADPKD animal models. First, for 
example, a human PKDI gene sequence containing such a gain- 
of,.- function PKDI mutation, and encoding such an aberrant PKDI 
protein, may be introduced into the genome of the animal of 
interest by utilizing well known techniques such as those 

15 described, below, in Section 5.8.2.1. Such a PKDI nucleic 
acid sequence must be controlled by a regulatory nucleic acid 
sequence which allows the mutant human PKDI sequence to be 
expressed in the cells, preferably kidney cells, of the 
animal of interest. The human PKDI regulatory 

20 promoter /enhancer sequences may be sufficient for such 

expression. Alternatively, the mutant PKDI gene sequences 
may be controlled by regulatory sequences endogenous to the 
animal of interest, or by any other regulatory sequences 
which are effective in bringing about the expression of the 

25 mutant human PKDI sequences in the animal cells of interest. 

. Expression of the mutant human PKDI gene may be assayed, 
for example, by standard Northern analysis, and the 
production of the mutant human PKDI gene product may be 
assayed by, for example, detecting its presence by utilizing 

30 techniques whereby binding of an antibody directed against 
the mutant human PKDI gene product is detected. Those 
animals found to express the mutant human PKDI gene product 
may then be observed for the development of ADPKD- like 
symptoms . 

35 Alternatively, animal models of ADPKD may be produced by 

engineering animals containing mutations within one copy of 
their endogenous PKDl-homologue which correspond to gain-of- 
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function mutations within the human PKD1 gene. Utilizing 
such a strategy, a PKD1 homologue may be identified and 
cloned from the animal of interest, using techniques such as 
those described, above, in Section 5.1. One or more gain-of - 
5 function mutations may be engineered into such a PKD1 homolog 
which correspond to gain-of -function mutations within the 
human PKD1 gene. By "corresponding", it is meant that the 
mutant gene product produced by such an engineered PKDl 
homologue will exhibit an aberrant PKDl activity which is 
10 substantially similar to that exhibited by the mutant human 
PKDl protein . 

The engineered PKDl homologue may then be introduced 
into the genome of the animal of interest, using techniques 
such as those described, below, in Section 5.8.2.1. Because 
15 the mutation introduced into the engineered PKDl homologue is 
expected to be a dominant gain-of -function mutation, 
integration into the genome need not be via homologous 
recombination, although such a route is preferred. 

Once transgenic animals have been generated, the 
20 expression of the mutant PKDl homolog gene and protein may be 
assayed utilizing standard techniques, such as Northern 
and/or Western analyses. Animals expressing mutant PKDl 
homolog proteins within the animals of interest, in cells or 
tissues, preferably kidney, of interest, the transgenic 
25 animals may be observed for the development of ADPKD-like 
symptoms ♦ 

Any of the ADPKD animal models described herein may be 
used to test compounds for an ability to ameliorate ADPKD 
symptoms . 

30 In addition, as described in detail in Section 5.11 

infra, such animal models can be used to determine the LD S0 
and the ED 50 in animal subjects, and such data can be used to 
determine the in vivo efficacy of potential ADPKD treatments, 

35 
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5.8*2.1 PRODUCTION OF PKD1 TRANSGENIC ANIMALS 
Any technique known in the art may be used to introduce 
a PKD1 gene into animals to produce the founder lines of 
transgenic animals. Such techniques "include, but are not 
5 limited to pronuclear microinjection (Hoppe, P.C. and Wagner, 
T.E., 1989, U.S. Pat. No. 4,873,191); retrovirus mediated 
gene transfer into germ lines (Van der Putten et al . , 1985 , 
Proc. Natl. Acad, Sci . , USA 82: 6148 - 6152 ) ; gene targeting in 
embryonic stem cells (Thompson et al . , 1989, Cell 56 : 313- 

10 3211;* electroporation of embryos (Lo, 1983, Mol Cell. Biol. 
3.: 18:03 -1814 ) ; and sperm-mediated gene transfer (Lavitrano et 
al . 1989 , Cell 57:717-723); etc* For a review of such 
techniques, see Gordon, 1989, Transgenic Animals, Intl. Rev. 
Cytol. 115 : 171-229 , which is incorporated by reference herein 

: 15 in its entirety) . 

When it is desired that the PKD1 transgene be integrated 
into the chromosomal site of the endogenous PKD1, gene 
targeting is preferred. Briefly, when such a technique is to 
be utilized, vectors containing some nucleotide sequences 

20 homologous to the endogenous. PKD1 gene of interest are 
designed for the purpose of integrating, via homologous 
recombination with chromosomal sequences, into and disrupting 
the function of, the nucleotide sequence of the endogenous 
PKD1 gene* 

25 —Once the PKD1 founder animals are produced, they may be 

bredT inbred, outbred, or crossbred to produce colonies of 
the particular animal. Examples' of such breeding strategies 
include but are not limited to: outbreeding of founder 
animals with more than one integration site in order to 

30 establish separate lines; inbreeding of separate lines in 
order to produce compound PKD1 transgenics that express the 
PKD1 transgene at higher levels because of the effects of 
additive expression of each PKD1 transgene; crossing of 
heterozygous transgenic animals to produce animals homozygous 

35 for a given integration site in order to both augment 

expression and eliminate the possible need for screening of 
animals by DNA analysis; crossing of separate homozygous 
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lines to produce compound heterozygous or homozygous lines; 
breeding animals to different inbred genetic backgrounds so 
as to examine effects of modifying alleles on expression of 
the PKD1 transgene and the developnient of ADPKD- like 
5 symptoms. One such approach is to cross the PKD1 founder 
animals with a wild type strain to produce an Fl generation 
that exhibits ADPKD symptoms, such as the development of 
polycystic kidneys. The Fl generation may then be inbred in 
order to develop a homozygous line, if it is found that 

10 homozygous PKD1 transgenic animals are viable. 

The present invention provides for transgenic animals 
that carry the transgene in all their cells, as well as 
animals which carry the transgene in some, but not all their 
cells, i.e. , mosaic animals. The transgene may be integrated 

15 as a single transgene or in concatamers,. e.g. , head-to-head 
tandems or head-to-tail tandems. 

5.8.2.2. SELECTION AND CHARACTERIZATION 
OF THE PKD1 TRANSGENIC ANIMALS 

2Q The PKD1 transgenic animals that are produced in 

accordance with the procedures detailed , above, in Section 

5.8.2.1., should be screened and evaluated to select those 

animals which may be used as suitable animal models for 

ADPKD. 

Initial screening may be accomplished by Southern blot 
analysis or PCR techniques to analyze animal tissues to 
verify that integration of the' transgene has taken place. 
The level of mRNA expression of the transgene in the tissues 
of the transgenic animals may also be assessed using 
3Q techniques which include, but are not limited to, Northern 
blot analysis of tissue samples obtained from the animal, in 
situ hybridization analysis, and reverse transcriptase- PCR 
(rt-PCR) • Samples of PKDl-expressing tissue, kidney tissue, 
for example, may be evaluated immunocytochemically using 
35 antibodies specific for the PKD1 transgene gene product . 

The PKD1 transgenic animals that express PKD1 mRNA or 
gene product (detected immunocytochemically, using antibodies 
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directed against PKD1 tag epitopes) at easily detectable 
levels should then be further evaluated histopathologically 
to identify those animals which display characteristic ADPKD- 
like symptoms. Such transgenic animals serve as suitable 
5 model systems for ADPKD . 

5.8.2.3. USES OF THE PKD1 ANIMAL MODELS 
The PKD1 animal models of the -invention may be used as 
model systems for ADPKD disorder and/or to generate cell 
10 lines that can be used as cell culture models for this 
disorder . 

/The PKD1 transgenic animal model systems for ADPKD may 
be used as a test substrate to identify drugs, 
pharmaceuticals, therapies and interventions which may be 

15 effective in treating such a disorder. Potential therapeutic 
agents may be tested by systemic or local administration. 
Suitable routes may include oral, rectal, or intestinal 
administration; parenteral delivery, including intramuscular, 
subcutaneous, intramedullary injections, as well as 

2 0 intrathecal, direct intraventricular, intravenous, 

intraperitoneal, intranasal, or intraocular injections, to 
name a few. The response of the animals to the treatment may 
be monitored by assessing the- reversal of disorders 
associated with ADPKD. With regard to intervention, any 

25 treatments which reverse any aspect of ADPKD- like symptoms 
should be considered as candidates for human ADPKD 
therapeutic intervention. However, treatments or regimens 
which reverse the constellation of pathologies associated 
with any of these disorders may be preferred. Dosages of 

30 test agents may be determined by deriving dose -response 
curves, as discussed in Section 5.11, below. 

In an alternate embodiment, the PKD1 transgenic animals 
of the invention may be used to derive a cell line which may 
be used as a test substrate in culture, to identify agents 

35 that ameliorate ADPKD -like symptoms. While primary cultures 
derived from the PKD1 transgenic animals of the invention may 
be utilized, the generation of continuous cell lines is 
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preferred* For examples of techniques which may be used to 
derive a continuous cell line from the transgenic animals, 
see Small et al . , 1985, Mol . Cell Biol. £:642-648. 

5 5.9. COMPOUNDS THAT INHIBIT EXPRESSION, 

SYNTHESIS OR ACTIVITY OF MUTANT 
PKD1 ACTIVITY , 

As discussed above, dominant mutations in the PKD1 gene 

that cause ADPKD may act as gain-of -function mutations which 

produce a form of the PKD1 protein which exhibits an aberrant 

0 activity that leads to the formation of ADPKD symptoms. A 
variety of techniques may be utilized to inhibit the 
expression, synthesis, or activity of such mutant PKD1 genes 
. and gene products ( i . e . , proteins) . 

For example, compounds such as those identified through 

5 assays described, above, in Section 5/4, which exhibit 
inhibitory activity, may be used in accordance with the 
invention to ameliorate ADPKD symptoms. Such molecules may 
include, but are not limited, to small and large organic 
molecules, peptides, and antibodies. Inhibitory antibody 

0 techniques are described, below, in Section 5.9*2. 

Further, antisense and ribozyme molecules which inhibit 
expression of the PKD1 gene, preferably the mutant PKD1 gene, 
may also be used to inhibit the aberrant PKD1 activity. Such 
techniques are described, below, in Section 5*9.1. Still 

5 further, as described, below, in Section 5*9.1, triple helix 
molecules may be utilized in inhibiting the aberrant PKD1 
activity. 

5.9.1. INHIBITORY ANTISENSE, RIBOZYME 
0 AND TRIPLE HELIX APPROACHES 

Among the compounds which may exhibit ant i -ADPKD 
activity are antisense, ribozyme, and triple helix molecules. 
Such molecules may be designed to reduce or inhibit mutant 
PKD1 activity* Techniques for the production and use of such 
15 molecules are well known to those of skill in the art. 
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Antisense RNA and DNA molecules act to directly block 
the translation of mRNA by binding to targeted mRNA and 
preventing protein translation. With respect to antisense 
DNA, oligodeoxyribonucleotides derived from the translation 
5 initiation site/ e.g. , between the -10 and +10 regions of 
the PKD1 nucleotide sequence of interest, are preferred. 

Ribozymes are enzymatic RNA molecules capable of 
catalyzing the specific cleavage of RNA. The mechanism cf 
ribozyme action involves sequence specific hybridization of 

10 the,.. ribozyme molecule to complementary -target RNA, followed 
by^.a endonucleolytic cleavage. The composition of ribozyme 
molecules must include one or more sequences complementary to 
the target PKD1 mRNA, preferably the mutant PKD1 mRNA, and 
must include the well known catalytic sequence responsible 
.,15 for mRNA cleavage. For this sequence, see U.S. Pat. No. 

5,093,246, which is incorporated by reference herein in its 
entirety. As such, within the scope of the invention are 
engineered hammerhead motif ribozyme molecules that 
specifically and efficiently catalyze endonucleolytic 

2 0 cleavage of RNA sequences encoding PKD1, preferably mutant 
PKD1 proteins. 

Specific ribozyme cleavage sites within any potential 
RNA target are initially identified by scanning the target 
molecule for ribozyme cleavage sites which include the 

25 following sequence: GUA, GUU and GUC. Once identified, 
short RNA sequences of between 15 and 2 0 ribonucleotides 
corresponding to the region of 'the target gene containing the 
cleavage site may be evaluated for predicted structural 
features, such as secondary structure, that may render the 

30 oligonucleotide sequence unsuitable. The suitability of 
candidate targets may also be evaluated by testing their 
accessibility to hybridization with complementary 
oligonucleotides, using ribonuclease protection assays. 
Nucleic acid molecules to be used in triplex helix 

35 formation should be single stranded and composed of 
deoxynucleotides * The base composition of these 
oligonucleotides must be designed to promote triple helix 
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formation via Hoogsteen base pairing rules, which generally 
require sizeable stretches of either purines or pyrimidines 
to be present on one strand of a duplex. Nucleotide 
sequences may be pyrimidine-based, which will result in TAT 
5 and CGC + triplets across the three associated strands of the 
resulting triple helix. The pyrimidine-rich molecules provide 
base complementarity to a purine-rich region of a single 
strand of the duplex in a parallel orientation to that 
strand. In addition, nucleic acid molecules may be chosen 

10 that are purine-rich, for example, contain a stretch of 

guanidine residues. These molecules will form a triple helix 
with a DNA duplex that is rich in- GC pairs, in which the 
majority of the purine residues are located on a single 
strand of the targeted duplex, resulting in GGC triplets 

15 across the three strands in the triplex,. 

Alternatively, the potential sequences that can be 
targeted for triple helix formation may be increased by 
creating a so called "switchback" nucleic acid molecule. 
Switchback molecules are synthesized in an alternating 5' -3', 

20 3' -5' manner, such that they base pair with one strand of a 
duplex first and then the other, eliminating the necessity 
for a sizeable stretch of either purines or pyrimidines to be 
present on one strand of a duplex. 

It is possible that the antisense, ribozyme, and/or 

25 triple helix molecules described -herein may reduce or inhibit 
the translation of mRNA produced by both normal and mutant 
PKDl alleles. In order to ensure that substantial normal 
levels of PKDl activity are maintained in the cell, nucleic 
acid molecules that encode and express PKDl polypeptides 

30 exhibiting normal PKDl activity may be introduced into cells 
which do not contain sequences susceptible to whatever 
antisense, ribozyme, or triple helix treatments. Such 
sequences may be introduced via gene therapy methods such as 
those described, below, in Section 5.5. Alternatively, it 

35 may be preferable to coadminister normal PKDl protein into 
the cell or tissue in order to maintain the requisite level 
of cellular or tissue PKDl activity. 
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Antisense RNA and DNA molecules, ribozyme molecules and 
triple helix molecules of the invention may be prepared by 
any method known in the art for the synthesis of DNA and RNA 
molecules. These include techniques for chemically 
5 synthesizing oligodeoxyribonucleotides and 

oligoribonucleotides well known in the art such as for 
example solid phase phosphoramidite chemical synthesis. 
Alternatively, RNA molecules may be generated by in vitro and 
in vivo transcription of DNA sequences encoding the antisense 

10 RNA molecule. Such DNA sequences may be- incorporated into a 
wide variety of vectors which incorporate suitable RNA 
polymerase promoters such as the T7 or SP6 polymerase 
promoters. Alternatively, antisense cDNA constructs that 
synthesize antisense RNA constitutively or inducibly, 

15 depending on the promoter used, can be introduced stably into 
cell lines. 

Various well-known modifications to the DNA molecules 
may be introduced as a means of increasing intracellular 
stability and half -life. Possible modifications include, but 
20 are not limited to, the addition of flanking sequences of 
ribo- or deoxy- nucleotides to the 5' and/or 3' ends of the 
molecule or the use of phosphorothioate or 2' 0-methyl rather 
than phosphodiesterase linkages within the 
oligodeoxyribonucleotide backbone. 

25 

5.9.2. ANTIBODIES THAT REACT WITH PKD1 GENE PRODUCT 
Antibodies that are both specific for mutant PKD1 gene 
product and interfere with its activity may be used. Such 
antibodies may be generated using standard techniques 
30 described in Section 5.3., supra, against the proteins 

themselves or against peptides corresponding to the binding 
domains of the proteins. Such antibodies include but are not 
limited to polyclonal, monoclonal, Fab fragments, F(ab') 2 
fragments, single chain antibodies, chimeric antibodies, 
35 humanized antibodies etc. 

The PKD1 protein appears to be an extracellular protein. 
Therefore, any of the administration techniques described, 
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below in Section 5,11 which are appropriate for peptide 
administration may be utilized to effectively administer 
inhibitory PKD1 antibodies to their site of action. 

5 5.10 METHODS FOR RESTORING PKD1 ACTIVITY 

As discussed above, dominant mutations in the PKDl gene 
that cause ADPKD may lower the level of expression of the 
PKDl gene or, alternatively, may cause inactive or 
substantially inactive PKDl proteins to be formed. In either 

10 instance, the result is an overall lower level of normal PKDl 
activity in the tissues or cells in which PKDl is normally 
expressed. This lower level of PKDl activity, then, leads to 
ADPKD symptoms. Thus, such PKDl mutations represent dominant 
loss-of -function mutations. Described in this Section are 

15 methods whereby the level of normal PKDl activity may be 
increased to levels wherein ADPKD symptoms are ameliorated. 

For example, normal PKDl protein, at a level sufficient 
to ameliorate ADPKD symptoms may be administered to a patient 
exhibiting such symptoms* Any of the techniques discussed, 

20 below, in Section 5.11, may be utilized for such 

administration. One of skill in the art will readily know 
how to determine the concentration of effective, non- toxic 
doses of the normal PKDl protein, utilizing techniques such 
as those described, below, in Section 5.11. 

25 Additionally, DNA sequences encoding normal PKDl protein 

may be directly administered to a patient exhibiting ADPKD 
symptoms, at a concentration sufficient to produce a level of 
PKDl protein such that ADPKD symptoms are ameliorated. Any 
of the techniques discussed, below, in Section 5.11, which 

30 achieve intracellular administration of compounds, such as, 
for example, liposome administration, may be utilized for the 
administration of such DNA molecules. The DNA molecules may 
be produced, for example, by recombinant techniques such as 
those described, above, in Section 5.1, and its subsections. 

35 Further, patients with these types of mutations may be 

treated by gene replacement therapy. A copy of the normal 
PKDl gene or a part of the gene that directs the production 
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of a normal PKD1 protein with the function of the PKD1 
protein may be inserted into cells, renal cells, for example, 
using viral or non- viral vectors which include, but are not 
limited to vectors derived from, for^ example, retroviruses, 
5 vaccinia virus, adeno-associated virus, herpes viruses, 

bovine papilloma virus or additional, non- viral vectors, such 
as piasmids. In addition, techniques frequently employed by 
those skilled in the art for introducing DNA into mammalian 
cells may be utilized. For example, methods including but 

10 not limited to electroporation, DEAE-dextran mediated DNA 

transfer) DNA guns, liposomes, direct injection, and the like 
may be utilized to transfer recombinant vectors into host 
cells. Alternatively, the DNA may be transferred into cells 
through conjugation to proteins that are normally targeted to 

15 the inside of a cell. For example, the DNA may be conjugated 
to viral proteins that normally target viral particles into 
the targeted host cell. Additionally, techniques such as 
those described in Sections 5,1 and 5,2 and their 
subsections, above, may be utilized for the introduction of 

20 normal PKD1 gene sequences into human cells. 

The PKD1 gene is very large and, further, encodes a very 
large, approximately 14 kb, transcript. Additionally, the 
PKD1 gene product is large, having 43 04 amino acids, with a 
molecular weight of about 467 kD. It is possible, therefore, 

25 that the introduction of the entire PKD1 coding region may be 
cumbersome and potentially inefficient as a gene therapy 
approach. However, because the entire PKDl gene product may 
not be necessary to avoid the appearance of ADPKD symptoms, 
the use of a "minigene" therapy approach (see, e.g. , Ragot, 

30 T. et al., 1993, Nature 161:647; Dunckley, M.G. et al . , 1993, 
Hum. Mol. Genet. 2:717-723) can serve to ameliorate such 
ADPKD symptoms . 

Such a minigene system comprises the use of a portion of 
the PKDl coding region which encodes a partial, yet active or 

35 substantially active PKDl gene product. As used herein, 

"substantially active" signifies that the gene product serves 
to ameliorate ADPKD symptoms. Thus, the minigene system 
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utilizes only that portion of the normal PKD1 gene which 
encodes a portion of the PKD1 gene product capable of 
ameliorating ADPKD symptoms, and may, therefore represent an 
effective and even more efficient AQPKD gene therapy than 
5 full-length gene therapy approaches. Such a minigene can be 
inserted into cells and utilized via the procedures 
described, above, for full-length gene replacement. The 
cells into which the PKD1 minigene are to be introduced are, 
preferably, those cells, such as renal cells, which are 

10 affected by ADPKD . Alternatively, any suitable cell can be 
transfected with a PKD1 minigene as long as the minigene is 
expressed in a sustained, stable fashion and produces a gene 
product that ameliorates ADPKD symptoms. Regulatory 
sequences by which such a PKD1 minigene can be successfully 

15 expressed will vary depending upon the cell into which the 
minigene is introduced. The skilled artisan will be aware of 
appropriate regulatory sequences for the given cell to be 
used. Techniques for such introduction and sustained 
expression are routine and are well known to those of skill 

20 in the art. 

A therapeutic minigene for the amelioration of ADPKD 
symptoms can comprise a nucleotide sequence which encodes at 
least one PKD1 gene product peptide domain, as shown in FIGS. 
7A-7B and 8. For example, such PKD1 peptide domains (the 

25 approximate amino acid residue positions of which are listed 
in parentheses after each domain name) can include a leucine* 
rich repeat domain (72 to 94, or 97 to 119) and/or a 
cysteine-rich repeat domain (32 to 65) , a C-type (calcium 
dependent) lectin protein domain (405 to 534), an LDL-A 

30 module (641 to 671), one or more PKD domains {282 to 353; 

1032 to 1124; 1138 to 1209; 1221 to 1292; 1305 to 1377; 1390 
to 1463; 1477 to 1545; 1559 to 1629; 1643 to 1715; 1729 to 
1799; 1815 to 1884; 1898 to 1968; 1983 to 2058; 2071 to 
2142), or at least one C-terminal domain (2160 to 4304) 

35 ( i >e, . a peptide domain found in the C-terminal half of the 
PKDl gene product) - Minigenes which encode such PKD1 gene 
products can be synthesized and/or engineered using the PKDl 
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gene sequence (SEQ ID NO:l) disclosed herein, and by 
utilizing the amino acid residue domain designations found in 
FIGS . 7A-7B and 8. 

Among the ways whereby the PKDl jninigene product 
5 activity can be assayed involves the use of PKDl knockout 
animal models. Such animal models express an insufficient 
level of the PKDl gene product. The production of such 
animal models may be as described above, in Section 5.8.2, 
and involves methods well known to those of skill in the art. 

10 PKDl .^minigenes can be introduced into the PKDl knockout 
animal models as, for example, described above, in this 
Section. The activity of the minigene can then be assessed 
by assaying for the amelioration of ADKPD-like symptoms. 
Thus, the relative importance of each of the PKD peptide 

15 domains, individually and/or in combination,, with respect to 
PKDl gene activity can be determined. 

Cells, preferably, autologous cells, containing normal 
PKDl expressing gene sequences may then be introduced or 
reintroduced into the patient at positions which allow for 

20 the amelioration of ADPKD symptoms. Such cell replacement 
techniques may be preferred, for example, when the PKDl gene 
product is a secreted, extracellular gene product. 

5.11. PHARMACEUTICAL PREPARATIONS 

25 AND METHODS OF ADMINISTRATION 

The identified compounds that inhibit PKDl expression, 

synthesis and/or activity can be 7 administered to a patient at 

therapeutically effective doses to treat polycystic kidney 

disease. A therapeutically effective dose refers to that 

2 0 amount of the compound sufficient to result in amelioration 

of symptoms of polycystic kidney disease. 

5.11.1. EFFECTIVE DOSE 
Toxicity and therapeutic efficacy of such compounds can 
35 be determined by standard pharmaceutical procedures in cell 
cultures or experimental animals, e.g. , for determining the 
LD 50 (the dose lethal to 50% of the population) and the ED 50 
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(the dose therapeutically effective in 50% of the 
population) . The dose ratio between toxic and therapeutic 
effects is the therapeutic index and it can be expressed as 
the ratio LD S0 /ED 5& . Compounds which exhibit large therapeutic 
5 indices are preferred. While compounds that exhibit toxic 
side effects may be used, care should be taken to design a 
delivery system that targets such compounds to the site of 
affected tissue in order to minimize potential damage to 
uninfected cells and, thereby, reduce side effects. 

10 The data obtained from the cell culture assays. and 

animal studies can be used in formulating a range of dosage 
for use in humans. The dosage of such compounds lies 
preferably within a range of circulating concentrations that 
include the ED 5C with little or no toxicity. The dosage may 

15 vary within this range depending upon the dosage form 

employed and the route of administration utilized. For any 
compound used in the method of the invention, the 
therapeutically effective dose can be estimated initially 
from cell culture assays. A dose may be formulated in animal 

20 models to achieve a circulating plasma concentration range 
that includes the IC 50 ( i.e. , the concentration of the test 
compound which achieves a half -maximal inhibition of 
symptoms) as determined in ceil culture. Such information 
can be used to more accurately determine useful doses in 

25 humans. Levels in plasma may be measured, for example, by 
high performance liquid chromatography. Additional factors 
which may be utilized to optimize dosage can include, for 
example, such factors as the severity of the ADPKD symptoms 
as well as the age, weight and possible additional disorders 

30 which the patient may also exhibit. Those skilled in the art 
will be able to determine the appropriate dose based on the 
above factors. 

5.11.2. FORMULATIONS AND USE 
35 Pharmaceutical compositions for use in accordance with 

the present invention may be formulated in conventional 
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manner using one or more physiologically acceptable carriers 
or excipients. 

Thus, the compounds and their physiologically acceptable 
salts and solvates may be formulated for administration by 
5 inhalation or insufflation (either through the mouth or the 
nose) or oral, buccal, parenteral or rectal administration. 

For oral administration, the pharmaceutical compositions 
may take the form of, for example, tablets or capsules 
prepared by conventional means with pharmaceutical^ 

10 acceptable excipients such as binding agents ( e.g. , 
pregelatinised maize starch, polyvinylpyrrolidone or 
hydroxypropyl methylcellulose) ; fillers ( e.g. , lactose, 
microcrystalline cellulose or calcium hydrogen phosphate) ; 
lubricants ( e.g. , magnesium stearate, talc or silica) ; 

15 disintegrants ( e.g. , potato starch or sodium starch 
glycollate) ; or wetting agents ( e.g. , sodium lauryl 
sulphate). The tablets may be coated by methods well known 
in the art. Liquid preparations for oral administration may 
take the form of, for example, solutions, syrups or 

20 suspensions, or they may be presented as a dry product for 
constitution with water or other suitable vehicle before use. 
Such liquid preparations may be prepared by conventional 
means with pharmaceutical^ acceptable additives such as 
suspending agents ( e.g. , sorbitol syrup, cellulose 

25 derivatives or hydrogenated edible fats) ; emulsifying agents 
( e . , lecithin or acacia) ; non-aqueous vehicles ( e.g. , 
almond oil, oily esters, ethyl alcohol or fractionated 
vegetable oils) ; and preservatives ( e ,gr. , methyl or propyl -p- 
hydroxybenzoates or sorbic acid) . The preparations may also 

30 contain buffer salts, flavoring, coloring and sweetening 
agents as appropriate. 

Preparations for oral administration may be suitably 
formulated to give controlled release of the active compound. 
For buccal administration the compositions may take the 

35 form of tablets or lozenges formulated in conventional 
manner . 
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PKD1 nucleotide sequences, either RNA or DNA, may, for 
example, be used in hybridization or amplification assays of 
biological samples to detect abnormalities of PKD1 
expression; e .g . , Southern or Northern analysis, single 
5 stranded conformational polymorphism (SSCP) analysis 
including in situ hybridization assays, alternatively, 
polymerase chain reaction analyses. Such analyses may reveal 
both quantitative abnormalities in. -the expression pattern of 
the PKD1 gene, and, if the PKD1 mutation is, for example, an 

10 extensive deletion, or the result of a chromosomal 

rearrangement, may reveal more qualitative aspects of the 
PKD1 abnormality. 

Preferred diagnostic methods for the detection of PKD1 
specific nucleic acid molecules may involve for example, 

15 contacting and incubating nucleic acids, derived from the 
target tissue being analyzed, with one or more labeled 
nucleic acid reagents as are described in Section 5*1, under 
conditions favorable for the specific annealing of these 
reagents to their complementary sequences within the target 

20 molecule* Preferably, the lengths of these nucleic acid 

reagents are at least 15 to 3 0 nucleotides- After incubation, 
all non-annealed nucleic acids are removed. The presence of 
nucleic acids from the target tissue which have hybridized, 
if any such molecules exist, is then detected. Using such a 

25 detection scheme, the target tissue nucleic acid may be 
immobilized, for example, to a solid support such as a 
membrane, or a plastic surface such as that on a microtiter 
plate or polystyrene beads. In this case, after incubation, 
non-annealed, labeled nucleic acid reagents of the type 

30 described in Section 5.1 and its subsections are easily 
removed. Detection of the remaining, annealed, labeled 
nucleic acid reagents is accomplished using standard 
techniques well-known to those in the art. 

Alternative diagnostic methods for the detection of PKDl 

35 specific nucleic acid molecules may involve their 

amplification, e.g. , by PCR (the experimental embodiment set 
forth in Mullis, K.B., 1987, U.S. Patent No. 4,683,202), 
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ligase chain reaction (Barany, F. , 1991, ?roc. Natl. Acad. 
Sci. USA 8^:189-193) , self sustained sequence replication 
(Guatelli, J.C. et al . , 1990, Proc . Natl. Acad. Sci. USA 
.87:1874-1878), transcriptional amplification system (Kwoh, 
5 D.Y et al. , 1989, Proc. Natl. Acad. Sci. USA £6: 1173-1177) , 
Q-Beta Replicase (Lizardi, P.M. et al . , 1988, Bio/Technology 
6:1197), or any other RNA amplification method, followed by 
the detection of the amplified molecules using techniques 
well known to those of skill in the art. These detection 

10 schemes are especially useful for the detection of RNA 

molecules if such molecules are present in very low numbers . 

■'In one embodiment of such a detection scheme, a cDNA 
molecule is obtained from the target RNA molecule ( e.g. , by 
reverse transcription of the RNA molecule into cDNA) , 

15 Tissues from which such RNA may be isolated . include any 
tissue in which wild type PKD1 is known to be expressed, 
including, but not limited, to kidney tissue and lymphocyte 
tissue. A target sequence within the cDNA is then used as 
the template for a nucleic acid amplification reaction, such 

20 as a PCR amplification reaction, or the like. The nucleic 
acid reagents used as synthesis initiation reagents ( e.g. , 
primers) in the reverse transcription and nucleic acid 
amplification steps of this method are chosen from among the 
PKDl-nucleic acid reagents described- in Section 5.1 and its 

25 subsections. The preferred lengths of such nucleic acid 
reagents are at least 15-30 nucleotides. For detection of 
the amplified product, the nucleic acid amplification may be 
performed using radioactively or non-radioactively labeled 
nucleotides. Alternatively, enough amplified product may be 

30 made such that the product may be visualized by standard 

ethidium bromide staining or by utilizing any other suitable 
nucleic acid staining method* 

5.12.2. DETECTION OF PKD1 GENE PRODUCT AND PEPTIDES 

35 

Antibodies directed against wild type or mutant PKD1 
gene product or peptides, which are discussed, above, in 
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Section 5.3, may also be used as ADPKD diagnostics, as 
described, for example, herein. Such diagnostic method, may 
be used to detect abnormalities in the level of PKD1 protein 
expression, or abnormalities in the J.ocation of the PKD1 
5 tissue, cellular, or subcellular location of PKDl protein. 
For example, in addition, differences in the size, 
electronegativity, or antigenicity of the mutant PKDl protein 
relative to the normal PKDl protein may also be detected. 
Protein from the tissue to be analyzed may easily be 

10 isolated using techniques which are well known to those of 
skill in the art. The protein isolation methods .employed 
herein may, for example, be such as those described in Harlow 
and Lane (Harlow, E. and Lane, D. , 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Laboratory Press, Cold 

15 Spring Harbor, New York) , which is incorporated herein by 
reference in its entirety. 

Preferred diagnostic methods for the detection of *ild 
type or mutant PKDl gene product or peptide molecules may 
involve, for example, immunoassays wherein PKDl peptides are 

20 detected by their interaction with an anti-PKDl specific 
peptide antibody. 

For example, antibodies, or fragments of antibodies, 
such as those described, above, in Section 5.3, useful in the 
present invention may be used to quantitatively or 

25 qualitatively detect the presence of wild type or mutant PKDl 
peptides. This can be accomplished, for example, by 
immunofluorescence techniques employing a f luorescently 
labeled antibody (see below) coupled with light microscopic, 
flow cytometric, or fluorimetric detection. Such techniques 

30 are especially preferred if PKDl gene products or peptides 
are expressed on the cell surface. 

The antibodies (or fragments thereof) useful in the 
present invention may, additionally, be employed 
histologically, as in immunofluorescence or immunoelectron 

35 microscopy, for in situ detection of PKDl gene product or 
peptides. In situ detection may be accomplished by removing 
a histological specimen from a patient, and applying thereto 
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a labeled antibody of the present invention. The 
histological sample may. be taken from a tissue suspected of 
exhibiting ADPKD. The antibody (or fragment) is preferably 
applied by overlaying the labeled antibody (or fragment) onto 
5 a biological sample. Through the use of such a procedure, it 
is possible to determine not only the presence of the PKD1 
peptides, but also their distribution in the examined tissue. 
Using the present invention, those of ordinary skill will 
readily perceive that any of a wide variety of histological 

10 methods (such as staining procedures) can be modified in 
order to' achieve such in situ detection. 

—Immunoassays for wild type or mutant PKD1 gene product 
or peptides typically comprise incubating a biological 
sample, such as a biological fluid, a tissue extract, freshly 

15 harvested cells, or cells which have been incubated in tissue 
culture, in the presence of a detectably labeled antibody 
capable of identifying PKD1 peptides, and detecting the bound 
antibody by any of a number of techniques well-known in the 
art . 

20 The biological sample may be brought in contact with and 

immobilized onto a solid phase support or carrier such as 
nitrocellulose, or other solid support which is capable of 
immobilizing cells, cell particles or soluble proteins. The 
support may then be washed with suitable buffers followed by 

25 treatment with the detectably labeled PKD1 specific antibody. 
The salid phase support may then be washed with the buffer a 
second time to remove unbound antibody. The amount of bound 
label on solid support may then be detected by conventional 
means . 

30 By "solid phase support or carrier" is intended any 

support capable of binding an antigen or an antibody. Well- 
known supports or carriers include glass, polystyrene, 
polypropylene, polyethylene, dextran, nylon, amylases, 
natural and modified celluloses, polyacryl amides, gabbros, 

35 and magnetite. The nature of the carrier can be either 

soluble to some extent or insoluble for the purposes of the 
present invention ♦ The support material may have virtually 
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any possible structural configuration so long as the coupled 
molecule is capable of binding to an antigen or antibody. 
Thus, the support configuration may be spherical, as in a 
bead, or cylindrical, as in the inside surface of a test 
5 tube, or the external surface of a rod. Alternatively, the 
surface may be flat such as a sheet, test strip, etc. 
Preferred supports include polystyrene beads. Those skilled 
in the art will know many other suitable carriers for binding 
antibody or antigen, or will be able to ascertain the same by 

10 use of routine experimentation. 

The binding activity of a given lot of anti-wild type or 
mutant PKD1 peptide antibody may be determined according to 
well known methods. Those skilled in the art will be able to 
determine operative and optimal assay conditions for each 

15 determination by employing routine experimentation. 

One of the ways in which the PKD1 peptide-specif ic 
antibody can be detectably labeled is by linking the same to 
an enzyme and use in an enzyme immunoassay (EIA) (Voller, A* , 
"The Enzyme Linked Immunosorbent Assay (ELISA) " , Diagnostic 

20 Horizons 2:1-7, 1978) (Microbiological Associates Quarterly 
Publication, Walkersville, MD) ; Voller, A. etal., J. Clin. 
Pathol. 31:507-520 (1978) ; Butler, J.E. , Meth. Enzymol . 
73:482-523 (1981); Maggio, E . (ed ♦ ) , ENZYME IMMUNOASSAY, CRC 
Press, Boca Raton, PL, 1980; Ishikawa, E. et al. f (eds.) 

25 ENZYME IMMUNOASSAY, Kgaku Shoin, Tokyo, 1981). The enzyme 
which is bound to the antibody will react with an appropriate 
substrate, preferably a chromogenic substrate, in such a 
manner as to produce a chemical moiety which can be detected, 
for example, by spectrophotometries fluorimetric or by visual 

30 means. Enzymes which can be used to detectably label the 
antibody include, but are not limited to, malate 
dehydrogenase, staphylococcal nuclease, delta- 5 -steroid 
isomerase, yeast alcohol dehydrogenase, alpha - 
glycerophosphate, dehydrogenase, triose phosphate isomerase, 

35 horseradish peroxidase, alkaline phosphatase, asparaginase, 
glucose oxidase, beta-galactosidase, ribonuclease, urease, 
catalase, glucose- 6 -phosphate dehydrogenase, glucoamylase and 
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acetylcholinesterase. The detection can be accomplished by 
colorimetric methods which employ a chromogenic substrate for 
the enzyme- Detection may also be accomplished by visual 
comparison of the extent of enzymatic reaction of a substrate 
5 in comparison with similarly prepared standards. 

Detection may be accomplished using any of a variety of 
other immunoassays. For example, by radioactively labeling 
the antibodies or antibody fragments it is possible to detect 
PKD1 wild type or mutant peptides through the use of a 

10 radioimmunoassay (RIA) (see, for example, Weintraub, B. f 
Principles of Radioimmunoassays, Seventh Training Course on 
Radioligand Assay Techniques, The Endocrine Society, March, 
1986, which is incorporated by reference herein) . The 
radioactive isotope can be detected by such means as the use 

15 of a gamma counter or a scintillation counter or by 
autoradiography. 

It is also possible to label the antibody with a 
fluorescent compound. When the f luorescently labeled 
antibody is exposed to light of the proper wave length, its 

20 presence can then be detected due to fluorescence. Among the 
most commonly used fluorescent labeling compounds are 
fluorescein isothiocyanate, rhodamine, phycoerythrin, 
phycocyanin, allophycocyanin, o-phthaldehyde and 
f luorescamine . 

25 The antibody can also be detectably labeled using 

fluorescence emitting metals such as 152 Eu, or others of the 
lanthanide series. These metals can be attached to the 
antibody using such metal chelating groups as 
diethylenetriaminepentacetic acid (DTPA) or 

30 ethylenediariiinetetraacetic acid (EDTA) . 

The antibody also can be detectably labeled by coupling 
it to a chemiluminescent compound. The presence of the 
chemiluminescent -tagged antibody is then determined by 
detecting the presence of luminescence that arises during the 

35 course of a chemical reaction. Examples of particularly 
useful chemiluminescent labeling compounds are luminol, 
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isoluminol , theromatic acridinium ester, imidazole, 
acridinium salt and oxalate ester. 

Likewise, a bioluminescent compound may be used to 
label the antibody of the present invention. Bioluminescence 
5 is a type of chemiluminescence found in biological systems 
in, which a catalytic protein increases the efficiency of the 
chemiluminescent reaction. The presence of a bioluminescent 
protein is determined by detecting the presence of 
luminescence. Important bioluminescent compounds for 
10 purposes of labeling are luciferin, lucif erase and aequorin, 

6. EXAMPLE: DETERMINATION OF THE PKD1 INTERVAL 

VIA GENETIC POLYMORPHISM ANALYSIS 

In the Working Example presented herein, genetic linkage 
15 studies are discussed which successfully reduced the 
potential PKD1 interval from approximately 750 kb to 
approximately 460 kb, thus substantially narrowing the 
genomic region in which the gene responsible for ADPKD lies. 

20 6.1 MATERIALS AND METHODS 

Sequencing tec hniques : Sequencing of cDNA clones and 
genomic clones was carried out using an Applied Biosystems 
ABI 3 73 automated sequencing machine according to the 
manufacturer's recommendations or by manual sequencing 

25 according to the method of Ausubel P. M. et al . , eds., 1989, 
Current Protocols in Molecular Biology, Vol. I, Green 
Publishing Associates, Inc., and John Wiley & Sons, New York, 
pp. 7.0.1 & ff. 

Inserts from the cDNA phage clones were excised with 
30 EcoRI and ligated into the appropriate cloning sites in the 
polylinker of pBlueScript plasmid (Stratagene) . Primers for 
sequencing of the plasmid clones were based on the known 
sequence of the polylinker. A second set of sequencing 
primers were based on the DNA sequences obtained from the 
35 first sequencing reactions. Sequences obtained using the 
second set of primers were used to design a third set of 
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primers and so on. Both strands of the double- stranded 
plasmids were sequenced* 

PGR products were sequenced using the dsDNA cycle 
sequencing system of GIBCO-BRL (Gaithersburg, MD) according 
5 to the manufacturer's instructions. PCR product was 

purified, prior to sequencing, by passing the DNA through a 
Centricon column twice according to the manufacturer's 
instructions (Amicon, Beverly, MA, USA) . 100-200ng of each 
purified PCR product was used as template in the sequencing 
10 reaction. 

Genomic sequences were obtained from PCR products as 
well as from subclones from the cosmids . To ensure the 
correct locus sequence was obtained over the duplicated 
locus. Only cGGGlO and cDEBll sequence was utilized when 
15 identifiying intron/exon boundaries. 

DNA la belling : Double -stranded DNA probes were made by 
labelling DNA by the method of Feinberg and Vogelstein, 1983, 
Anal. Biochem. 132 : 6-13, Primers were end- labelled with 
20 -y 32 p-ATP using the method of Ausubel F. M, et al . , eds., 1989, 
Current Protocols in Molecular Biology, Vol- 1, Green 
Publishing Associates, Inc., and John Wiley & Sons, New York, 
pp. 4.8.2- &ff. 

25 PCR con ditions : Conditions for the PCR reactions were 
determined empirically for each reaction by analyzing an 
array of reaction conditions with the following variables: 
magnesium concentrations of ImM, 2mM, 4mM; annealing 
temperature; extension time; primer concentration and primer 

30 concentration ratio. 

The fixed conditions were: 

1. extension at 72°C using Taq polymerase, 2.5u/100/xl 
reaction volume; 

2. denaturation at 95°C for 1 minute; and 
35 3- annealing for 30 seconds. 
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Primer design : Primers were designed using the computer 
program "PRIMER" , 

Genetic linkage studies : Genetic linkage studies were 
5 carried out using computerized algorithms (Lathrop GM W et 
al . , 1984, Proc. Natl. Acad. Sci . USA, 81 : 3443 -3 ;46 ; Lathrop 
GM and Lalouel J-M., 1984, Am. J. Hum. Genet. 36:460-465; 
Lathrop G.M., Lalouel J.-M., Julier C., Ott J., 1985, Am. J 
Hum. Genet. 3^:482-498). 

10 

Single-stra nded conformational polymorhism analysis (SSCP) : 

SSCP analysis to detect sequence polymorphisms was 
carried out according to the method of Orita et al, 198 9, 
Genomics, 5:874-879. Primers were designed to amplify each 
15 exon (see figure 10 and Table 1, below) . The 3' end of each 
primer was designed to lie ~20-50bp from the nearest 
intron/exon boundary so that mutations in the splice donor 
and acceptor sites could be detected. 

20 Table 1 : Primer Sequences from the PKD1 gene 



Primer Name 


Sequence (5' -3') 


Sense /ant isense 


KG8-F9 


CTGCCGGCCTGGTGTCG 


sense 


KG8-F11 


AGGGTCCACACGGGCTCGG 


sense 


KG8-F23 


CAGGGTGTCCGTGCGTGACTG 


sense 


KG8-F25 


GTCCAGCACTCCTGGGGAGA 


sense 


KG8-F26 


ACGCAAGGACAAGGGAGTAG 


sense 


KG8-F27 



AGTGCCGCGGCCTCCTGAC 


sense 


KG8-F28 


GCTGGCCTAGGCGGCTTCCA 


sense 


KG8-MF2 


CACCCCACGGCTTTGCACT 


sense 


KG8-MF4 


CCCAGGCAG CGAGGCTGTC 


sense 
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5 



10 



KG8 -K02 


ACACCAGGCCAACAGCGACTG 


antisense 


KG8 -R9 


ACAGCCACCAGGAGCAGGCTG 
A 


ant isense 


KG8-R13 


TGTAGCGCGTGAGCTCCAG 


antisense • 


KGB-R23 


CACCCCACCCTACCCCAG 


antisense 


KG8-R24 


GGAGGCCACAGGTGAGGCT 


antisense 


KG.8 - R2 7 


CGGAGGAGTGAGGTGGGCTCC 


antisense 


KG8-R28 


AG CC ATTGTG AGG ACT CT C C C 


antisense 


NKG9-F2 


AAG AC CTG AT C C AG C AG G T C C 


sense 


NKG9-F07 


CAGCACGTCATCGTCAGG 


sense 


NKG9-R03 


CTCCCAGCCACCTTGCTC 


antisense 


NKG9-R07 


GCAGCTGTCGATGTCCAG 


antisense 


NKG9-RM2 


TCTGTCCAACAAAGGCCTG 


antisense 



20 

6.2 RESULTS 

It was previously shown that the PKD1 gene maps, by 
genetic linkage, to the interval between the polymorphic 
genetic markers D16S259 (which lies on the telomeric side of 

25 PKD1) and D16S25 (which lies on the centromeric side of PKD1) 
(see Somlo et al . , 1992, Genomics 13:152). The smallest 
interval between genetic markets f called the PKD1 interval 
was found to be approximately 750kb (see Germino et al . , 
1992, Genomics 13:144), The PKD1 interval was isolated as a 

3Q series of forty overlapping cosmid and phage clones. The 
cloned DNA contained the entire PKD1 interval with the 
exception of two gaps of less than lOkb and less than 50kb 
(see FIG. 1; Germino et al ♦ , Genomics 13:144, 1992). 

In the Example presented herein, in order to reduce the 
PKD1 interval still further, a systematic search for 

3 5 

additional polymorphic markers was undertaken. Single- 
stranded DNA probes (CA) e . ls were hybridized to the set of 
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clones from the PKD1 interval. The phage clone w5 . 2 (see 
FIG. 1) was found to hybridize to the probe and the sequence 
flanking the (CA)n (w5.2 repeat) was determined using phage 
DNA as a template. Primers for the polymerase chain reaction 
5 (PGR) were designed and used to detect polymorphism within 
the w5.2Ca repeat. The position of the w5.2Ca repeat is 
shown in FIG. 2. This w5.2Ca repeat was used in genetic 
linkage studies in 15 PKD1 families, and found to lie proximal 
to the PKD1 locus. This experiment reduced the size of the 
10 PKD1 interval to approximately 460kb, as shown in FIG. 2. 

' 7. EXAMPLE: IDENTIFICATION OF POTENTIAL PKD1 

TRANSCRIPTS 

In the Working Example presented herein, transcription 

. _ units within the 460 kb PKD1 interval, (FIG. 2) defined in 

Section 6, above, were identified. The interval was found to 

have a maximum of 27 transcriptional units (TU) , which 

contained a total of approximately 300 kb. 

2Q 7.1 Materials and Methods 

cDNA library screening : cDNA libraries were prepared from 
several sources including EBV transformed lymphocytes, 
teratocarcinoma tissue, fetal kidney and HeLa cells. In 
addition a human adult kidney library. was purchased from 

2 g Clontech Inc. (San Diego, CA) * 

Total RNA from each tissue was prepared by the 
guanidinium chloride method. First strand cDNA synthesis was 
prepared using random six base oligonucleotides by the method 
of Zhou et al, Journal Biol, Chem. , 267 : 12475 (1992). EcoRI 

2 0 sites within the cDNA were blocked by DNA methylase. The 
cDNA was flush-ended with T4 kinase and EcoRI linkers were 
added with DNA ligase- The cDNA was cleaved with EcoRI and 
ligated into either bacteriophage lambda-gtlO or lambda-ZAP 
(Stratagene) . The phage were packaged with high-efficiency 

35 packaging extract (Stratagene) . At least one million primary 
clones were plated. The library was amplified 100 -fold and 
stored at 4° C. 
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At least 500,000 plaques of each library were screened 
with each cosmid clone at a density of 25,000 per 75mm 
diameter plate. Duplicate filter lifts were made of each 
plate {Ausubel, supra ) . The radiolabeled probes were 
5 incubated with an excess of unlabelled denatured human DNA 
and then added to the library filters in a sodium phosphate 
buffer at 65° C. for 16 hours. The filters were washed in 
2xSSC at 65° C. for 1 hour and O.lxSSC, 0 . lxSDS at 65° C. for 
one hour. Kodak XAR-5 was exposed to the library filters for 
10 4 «16 hours. Duplicate positives were picked and replated at 
a density of approximately 100-500 per plate. Filter lifts 
of these secondary plates were made and hybridized as for the 
primary lifts; pure isolated plaques were obtained and 
inoculated into 50ml cultures and the phage DNA was purified. 

15 

Sequen cing techniques : Techniques were as described in 
Section 6.1, above. 

7 . 2 Results 

20 To identify transcribed sequences within the PKD1 

interval (FIG. 2), the cosmid and phage clones from the 
interval were hybridized to cDNA libraries made from a 
variety of human tissues including fetal and adult kidney, 
teratocarcinoma, adult liver, lymphoblast, HeLa, and adult 

25 brain. More than 100 hybridizing cDNA clones were 

identified. These clones were subcloned into pBlueScript 
plasmids and sequenced. The sequence data combined with 
hybridization data (between cDNA clone and genomic clone) 
allowed the cDNA clones to be assigned to a maximum of 27 

30 transcription -units, as described below. 

Namely, hybridization between two cDNA clones was 
evidence that the clones are part of the same transcription 
units. Similarly, sequence identities of greater than 25bp 
between the cDNA clones were used as evidence that the clones 

35 were part of the same transcription unit. 

Table 2, below, lists these units (a-z, aa) by the name 
of the longest clone . 
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Table 2 



Putative Transcriptional Unit 
Sequences Isolated From the PKD1 Region 





Clone 


CANDIDATE GENES IN THE PKD1 REGION 

Insert 
Size 

(kb) cDNA Libraries Motif 


a. 


20.7 


2.1 


cy, terat 




b. 


SazD 


2.7 


cy 


G-protein 0 subunit-like 


c. 


SazB 


2.2 


cy, terat 


scERV from yeast 


d. 


SazlO 


4.0 


cy, lym 




e. 


Sazl3 


1.5 


cy, terat 


tandem 120 amino-acid 










repeat; Z01 - family 


f. • 


Saz20 


5.5 


cy. lym, terat 




g- 


KG8 


3.4 


lym 




h. 


NKG9 


1.8 


lym 




i. 


NKG10 


2.8 


lym 




j- 


NKG11 


2.4 


lym 




k. 


Nik4 


0.9 


kid 




1. 


Nik7 


2.3 


lym, terat 


rab gene motif 


m. 


KG3 


3.8 


terat, cy 


G-protein 0 subunit-like 


n. 


Nik9 


2.2 


cy 


ankyrin repeat 


0. 


KG4 


0.6 


kid 




P- 


KM17 


1.6 


terat cv 


G-Drotein 8 subunit-like 


q. 


NiklO 


1.6 


lym 




r. 


KG5 


2.6 


cy 


zinc-finger protein 


s. 


KG1 


1.1 


kid 


DNase 


t. 


KGo 


3.4 


kid, cy, lym 


human homolog of 










mouse RNSP1 gene 


u. 


Nik3 


3.2 


terat, lym, cy 


* 


v. 


Nik2 


3.4 


terat, lym, cy / 




w. 


Nikl 


0.8 


kid 


* 


X. 


Nik8 


1.6 


lym 


* 


y- 


KG17 


2.2 


lym 




z. 


AJ1 


1.4 


cy 


cyclin-F homolog 


aa. 


MAR1 


2.0 


kid 


MDR-like 



10 



15 



20 



25 



30 



* u, v, w, x are part of an 8kb transcriptional unit (nik 823) which produces a MDR-like channel. 
MAR1 is another member of the gene family. ATP-dependent transporter cyclin proton-channel of 
^ vacuolar proton ATPase 

cDNA library from which the clone was obtained: cy=cyst; terat =teratocarcinoma; 
lym = lymphoblast; kid = kidney 
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Thus, these 27 transcription units were considered by virtue 
of their genomic localization to be candidate genes for PKD1 . 
The total transcribed cDNA in the 27 transcription units 
equalled about 6 0kb, 
5 The sequence of each clone was compared with sequences 

deposited in the public databases Genbank, EMBL, and 
SwissProt. Several of the cDNA clones contained sequences 
predicted to code for known protein motifs. Because so 
little was known of the molecular basis of ADPKD none of the 
10 candidate genes could be ruled out by virtue of sequence 
motifs. 

8. PKD1 INTERVAL - NORTHERN ANALYSIS 
In the Working Example presented herein, an analysis of 
15 the transcriptional expression patterns of the TUs described, 
above, in Section 7, was conducted, 

8.1 MATERIALS AND METHODS 
Northern blot analysis : Poly A+ RNA (2/xg) from heart, brain, 
20 placenta, lung, liver, skeletal muscle, kidney and pancreas 
was hybridized with radio- labelled cDNA probes from the TUs 
within the PKD1 interval, under standard conditions. 

8 . 2 RESULTS 

25 ■:■ Inserts from the cDNA clones of the TUs described in 
Section 7, and listed in Table 2, above, were used to probe 
Northern blots containing total' RNA and polyA-enriched RNA 
from normal human organs and from between 8 and 10 kidneys 
removed from patients with ADPKD, 

30 The expression profile was compared with the pattern of 

pathology in ADPKD to determine a priority for further 
characterization. The Northern analysis demonstrated that 2 6 
of the TUs in the PKD1 interval were expressed in kidney, 
with the exception of Nik9. Nik9 mRNA was found to be 

35 abundant in human brain but expressed at very low level in 
fetal and adult human kidney. These data, therefore, 
indicated that Nik9 is not the PKD1 gene. No consistent 
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differences were observed between normal and ADPKD kidneys 
for any transcript. 

9 . EXAMPLE: PKD1 INTERVAL MUTATION SCREENS 
5 A systematic search was undertaken to detect mutations 

in ADPKD patients in the transcribed regions listed in Table 
2. The mutation screen used several independent techniques. 
Southern blot analysis of patient DNA digested with at least 
three different restriction endonucleases was performed. 

10 Several differences between the restriction .patterns were 
detected but none was found only in patients with ADPKD. 
Single-stranded conformational polymorphism analysis was 
carried out using cDNA isolated from patient transformed 
lymphocytes as a template. A large number of allelic 

15 differences was found but none were found to alter the 

deduced product of transcription. Sequence analysis of the 
KGB cDNA was carried out in seven ADPKD patients and one 
normal . The deduced coding region of 2 . 6kb was sequenced 
using cDNA, made by reverse transcription from patient 

20 transformed lymphocyte mRNA, as a template. The cDNA was 
amplified by PGR in a series of overlapping sections and the 
PCR products were sequenced. No sequence differences were 
detected between patients and normal individuals. In this 
way more than 80% of the coding DNA- in the transcription 

25 units was scanned and no mutations were found in PKD1 

patients. These experiments excluded the scanned segments of 
the transcription units with a likelihood of 95% based on the 
reasonable assumption that no ADPKD mutation accounts for 
>70% of all ADPKD cases* 

30 Thus, the following transcription units were excluded: 

sazB, sazD sazl3, KG3 , KG5 , KGI f saz20, KM17, Nikl, Nik2, 
Nik3, Nik8, KG17, Nik7, MAR1 • These excluded transcripts 
represent >80% of the combined identified coding sequences in 
the PKD1 region, 

35 It has previously been noted that de novo mutation to 

ADPKD accounts for at least 1% of cases. Two mechanisms have 
been shown to account for the vast majority of new mutation 
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rates of this order. First, the coding region may be large. 
Duchenne muscular dystrophy (DMD) provides an example of this 
situation: the dystrophin gene which is mutated in DMD has a 
transcript of approximately 14kb. About 3 0% of DMD cases 
5 arise by de novo mutation. The second mechanism that may 
account for a high new mutation rate is the presence of an 
unstable repetitive element. Unstable trinucleotide repeats 
in which the repeat sequence contains >50% C and G have been 
shown to cause the fragile X syndrome, Huntington's disease 
10 and myotonic dystrophy. In two of these " diseases , high 

mutation rates or the appearance of progressively more severe 
disease in successive generations (anticipation) have been 
documented . 

A systematic search for trinucleotide repeats in the 

15 PKD1 interval was undertaken. Single- stranded probes (15-25 
nucleotides) containing all possible combinations of 
trinucleotide repeats were synthesized, radiolabelled and 
hybridized to Southern blots containing the complete set of 
clones comprising the PKD1 interval. The hybridization and 

20 washing conditions were adjusted to allow detection of all 
perfect repeats of 15 nucleotides or more. Eight separate 
banks of trinucleotide repeats within the PKD1 interval were 
found. Primers were designed so that the trinucleotide 
repeat arrays could be amplified by PCR and size-fractionated 

25 on polyacrylamide gels. No differences were found between 
ADPKD patients and controls. 

Additionally, two other screening methods were attempted 
for the identification of trinucleotide expansions in the 
PKD1 interval. Southern blots of DNA from normal and 

3 0 affected individuals was probed with inserts containing the 
repeats* This revealed no polymporphisms . Further, multiply 
restricted DNA samples (Rsa/Sau3A/Hinf 1) samples were probed 
with trinucleotide repeat oligonucleotides. Though myotonic 
dystrophy and fragile-X mental retardation patients could be 

35 identified via such methods, it was not possible to identify 
any common pattern in ADPKD patients. 
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The cDNA clones Nikl, Nik2 , Nik3 , and Nik8 were found to 
hybridize to an 8kb transcript present in kidney. These 
clones were assumed to be part of the same transcript. PCR 
product that bridged the three gaps in sequence between the 
5 four clones were obtained using primers based on sequences 
within the four cDNA clones. In this way approximately 8kb of 
the transcribed DNA sequence of the gene represented by Nikl, 
Nik2, Nik3, and Nik8 was obtained. Because the coding region 
is large the gene was expected to have a high spontaneous 

10 mutation rate and therefore to be a good candidate, for the 
PKD1 gene. A detailed exon-by-exon search of the :gene, 
however, revealed no evidence of mutations in ADPKD patients. 
This left only one TU within the region which was considered 
large enough to be a reasonable candidate for the PKD1 gene . 

15 The characterization of clones and sequences within this TU, 
part of the putative PKD1 gene, is described, below, in the 
Working Examples presented in Sections 10 and 11. 

10 . EXAMPLE: SSCP Analysis of ADPKD Patients 
20 In the Working Example presented herein, an SSCP 

analysis of genomic DNA amplified from DNA derived from 
normal and ADPKD patients was conducted which identified 
ADPKD-specific allelic differences which map to the single 
gene of the PKD1 interval which was described, above, in the 
25 Working Example presented in Section 10. 

10.1 Materials and Methods 
SSCP Analysis : Single -Stranded Conformational Analysis 
(SSCP) was performed as follows: 50ng of genomic DNA was 

30 amplified by PCR under standard conditions in a reaction 

volume of 20 /tl. Ten microliters of the amplified product was 
added to 90 /il of formamide buffer, heated at 97 °C for 4-5 
minutes, and cooled on ice. Four microliters of the reaction 
mixture was loaded on a polyacrylamide gel (10%, 50:1 

35 acrylamide:bisacrylamide) containing 10% glycerol. The gel 
was run at 4°C for 12 hours with 10W power in 0.5 X TBE 
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buffer. The gel was dried and exposed to a Molecular Dynamic 
Phosphor- Imager screen for 4 to 16 hours. 

Intron/Exon Mapping : Primers produced from cDNA clones were 
5 used to PCR amplify genomic DNA sequences. Amplified 
products were sequenced, using standard methods. Those 
sequences which differed from the cDNA sequences indicated 
intron sequences. 

10 P . ... Amp lification : Procedures for amplification were as 
described, above, in Section 6.1.. 

10.2 Results 
Because the large size of the putative 
15 KG8/NKG9/NKG10/NKG11 transcript makes it. a likely site for 
mutation, the intron/exon structure of part of the gene 
represented by KG 8 and NKG9 was determined so that an exon- 
by-exon search for mutations could be conducted. The 
exon/intron structure analysis allowed PCR primers to be 
20 designed for the amplification of several exons of the PKD1 
gene . 

These primers were used to PCR-amplify genomic DNA and 
to perform SSCP of ADPKD patients and normal individuals. In 
two,ADPKD patients SSCP patterns were observed that showed 

25 allelic differences. Both patients were heterozygous for an 
SS CH- variant that was n ot seen in a large number of normals 
from the normal population (Fig' 3A-3B) . In samples from 
these two individuals, 4 bands are visible, instead of the 2 
single-strand bands seen in samples from normal individuals. 

3 0 The 4 bands are of equal intensity and are presumed to 

comprise two allelic sense strand and two allelic antisense 
strands- 

Thus, the results discussed in this Example, coupled 
with the analyses reported, above, in the Examples presented 
35 in Sections 6 through 9 provide positive correlative evidence 
that the gene corresponding to the putative transcription 
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unit of which the clones KG8 , NKG9 , NKG10 and NKGll are 
believed to be a part, is the PKD1 gene. 

11. EXAMPLE: MOLECULAR CHARACTER I Z AT ION OF THE PKD1 GENE 
5 In this Example, the complex structure of the PKD1 gene 

and gene product is described. Included herein is a 
description of the PKD1 gene structure, the nucleotide 
sequence of the entire coding region of the PKD1 transcript, 
as well as the amino acid sequence and domain structure of 
10 the PKDl gene product. This description not only represents 
the first elucidation of the entire PKDl coding sequence, but 
additionally also corrects errors in the portion of the PKDl 
coding region which had previously been reported. Also, a 
ADPKD-1 causing mutation within the PKDl gene which results 
15 in a frameshift is identified. Further, the strategy 

utilized to characterize this extensive and difficult nucleic 
acid region is summarized. 

A portion of the nucleotide sequence corresponding, in 
large part, to the 3' end of the PKDl gene had recently been 
2 0 reported (European Polysystic Kidney Disease Consortium 
[hereinafter abbreviated EPKDC] , 1994, Cell 77:881-894) - 
Specifically, the terminal 5.6 kb of the PKDl transcript were 
studied and an open reading frame of 4 . 8 kb was reported. 
The peptide this putative open reading frame encodes, which 
25 would correspond to the carboxy terminal portion of the PKDl 
protein, did not reveal any homologies to known proteins and, 
if this derived amino acid sequence was, in fact, part of the 
PKDl protein, its sequence did not suggest a function for the 
PKDl gene product. 
30 For this lack of revealing information, in addition to 

the fact that only a small percentage of ADPKD-causing 
mutations appear to reside within the 3' end of the PKDl 
gene, the characterization of the 5' end of the gene and a 
more complete analysis of the PKDl gene and gene product were 
35 greatly needed. 

As acknowledged by the EPKDC {EPKDC, 1994, Cell 77:881- 
894), however, the elucidation of the complete PKDl coding 



- 77 - 



WO 95/34573 PCTAJS95/07079 



sequence presents major problems. Unlike the 3' end of the 
PKD1 gene, the 5' two- thirds of the gene appear to be 
duplicated several times at other genomic positions. 
Further, at least some of these duplications are transcribed. 
5 Thus, great difficulties arise when attempting to distinguish 
sequence derived from the authentic PKD1 locus apart from 
sequence obtained from the duplicated PKDl-like loci. 

11-1. MATERIALS AND METHODS 

10 ... 11.1.1. GENOMIC CLONES 

:, The human PI phage named PKD 1521 was isolated from a 
human PI library using primers from the adjacent TSC2 gene. 
The first screen utilized primers F33tcttctccaacttcacggctg, 
R32aaccagccaggttttggtcct , followed by F38caagtccagctcctctccc, 

15 R40gctctttaaggcgtccctc and ultimately screened with primers 
in the KG 8 gene (F9/R5) see page 68 for KG8-R5 5' primer, 
while KG8-R5 5' gcgctttgcagacggtaggog 3 ' . The cosmid cGGGlO 
has been previously described (Germino, G.G., Weinstat- 
Saslow, D., Himmelbauer / H., Gillespie G.A.J. , Somlo, S., 

20 Wirth, B., Barton, N w Harris, K.L., Frischauf, A.M. and 
Reeders, S.T. (1992) Genomics, 13:144-151). The cosmid 
cGGGlO was mapped using various restriction enzymes as 
described by the manufacturers. A random library of the 
cosmid was constructed by cloning sheared DNA fragments into 

25 the^Smal site of pUC 19. Initial sequence assembly for the 
cosmid cGGGlO was performed on forward and reverse sequences 
of approximately 1000 random cloned fragments and a 
preliminary map was constructed using the restriction map of 
the cosmid- Directed subclones of cGGGlO were made in the 

30 plasmid pBluescript (Stratagene) in order to create 

sequencing islands specific physical locations. These large 
subclones from cGGGlO were then restricted with more frequent 
cutter enzymes and cloned into M13mpl9 and mpl8. In 
addition, if gaps were found in cloned regions, directed 

35 sequencing was performed from the flanking regions, to join 
the anchored contigs. A contig of 34.3 Kb was constructed, 
with two gaps in what appear to be highly repetitive regions 
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with no identifiable coding sequence. cDEBll was has been 
described previously (Germino, G.G. , Weinstat-Saslow, D. , 
Himmelbauer, H., Gillespie G.A.J. , Somlo, S., Wirth, B. f 
Barton, N., Harris, K.L., Frischauf^ A.M. and Reeders, S.T. 
5 (1992) Genomics, 13:144-151), A random library was 

constructed with sheared cDEBll DNA and cloned into the Smal 
site of pUC19. This cosmid was sequenced to obtain at least 
2 -fold coverage . 

The sequencing was done by cycle sequencing and run on 

10 AB1 machines following the manufacturer'-' s instructions with 
modifications as described below. Because of the difficulty 
of - sequencing certain regions, the standard chemistry of 
sequencing used withthe ABI machines had to be modified. 
Both dye terminator and dye primer sequence were used when 

15 appropriate with sequencing different regions. Different 
polymerases and different melting and polymerization 
conditions were also used in order to optimize the quality of 
the sequence. When sequencing across the CpG island at the 
5' end of the PKD1 gene, the best sequencing results were 

20 obtained when adding 5% DMSO to the polymerization step and 
sequencing single-stranded templates. 

11.1.2. cDNA LIBRARY SCREENING 
The first cDNA used to screen libraries was KG8, which 

25 maps to the unique region of the PKD1 locus and was recovered 
from an adult lymphocyte libary. In order to complete the 
rest of the PKD1 transcript, fourteen new cDNAs were 
sequenced to completion, four cDNAs were partially sequenced 
and an additional 20 cDNAs were mapped against cGGGlO. 

30 Additional data was obtained from RT-PCR products of the 
renal cell carcinoma cell line SW839 (ATCC) . 

Overlapping partial cDNAs described below were isolated 
from lymphocyte and fetal kidney libraries. In this way, a 
14 kb transcript was assembled starting from the 3 ' until the 

35 CpG island was reached. It is assumed that the 5 'end of the 
PKD1 trancript has been located. No other clones further 
upstream were recovered upon further screening those cDNA 
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libraries that had provided the majority of the cDNAs which 
were used to assemble the full length PKD1 cDNA. 

The cDNAs FK7 and FK11 were recovered from a fetal 
(gestation age of 14-16 weeks) kidney cDNA library using KG8 
5 cDNA as a probe. This library was constructed with the 

Superscript Lambda System from (Gibco/BRL) , using oligo d(T) 
primed cDNA. FK7 and FK11 were recovered as SAlI inserts. 
The cDNAs designated BK156, BK194, UN49 and UN52 were 
recovered from a lymphocyte cell library and pulled by using 

10 FK7 aas a probe. UN34 was recovered from the same library by 
hybridizing with a Seal -Sail 5 'end probe of FK7 . UN53, UN54 
and^UN59 were recovered from the same lymphocyte library (M. 
Owen laboratory, ICRF; Dunne, PhD thesis, 1994) by double 
screening clones that were both negative when screening with 

15 an FK7 probe and positive when screening with BK156 and UN52 . 
The cDNA NKG11 was recovered from a lymphocyte library 
screened with cGGGlO and was described previously (Germino, 
G.G-, Weinstat-Saslow, D., Himmelbauer, H. , Gillespie G.A.J. , 
Sornlo, S., Wirth, B., Barton, N., Harris, K.L. , Frischauf, 

20A.M. and Reeders, S,T. (1992) Genomics, 13:144-151). ). The 
cDNA named Fhkb21 was obtained from a Clonetech fetal kidney 
library using BK156 as a probe. MSK3 was obtained by probing 
an adult kidney library (Clorietech) with a probe from 5' end 
of KG8 . MSK4 was obtained by nested- RT-PCR from primers 

25 spanning from exons 7-8 to exons 13*14, followed by second 
round of PGR with internal primers in exon 8 and exon 13. 



11,1,3. cDNA SEQUENCING 
The cDNAs were sequenced to 5- fold coverage by primer 

30 walking and/or subloning small fragments into M13 or 

pBluescript. All cDNA sequences were compared to the cGGGlO 
cosmid sequence to assess whether they were from the correct 
locus and to determine intron/exon boundaries. Discrepancies 
were resequenced to determine whether the differences were 

35 genuine. Some of the cDNAs described above were clearly 
different from the genomic sequence, suggesting that these 
cDNAs were encoded by another locus . 
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MSK3, FK7 and FK11 were obtained using a PKD1 -specif ic 
probe (KG8) were found to be 100% identical to genomic 
sequence. The cDNA and UN4 9, which showed 99% identity, is 
possibly PKD1 -specif ic. BK241, BK194^ UN52, UN53, UN54 and 
5 UN59, BK156, Fhkb21 and NKG11 were 96-98% homologous to the 
cGGGlO defined exon sequence, and thus were assumed to have 
originated fromt the duplicated loci. In general, 
differences between genomic cDNA were nucleotide differences 
scattered through out the cDNA sequence. One exception is 

10 BK194, which has an extra CAG at position 1863 of the 
previously published partial sequence and arose from 
alternative splicing of exon 33* Another exception is BK241 
that has an insertion of the following sequence in a tandem 
repeat of TTATCAATACTCTGGCTGACCATCGTCA at position 1840 of 

15 the previously published sequence (European PKD1 Consortium) . 
This sequence was not included in the authentic, full-length 
PKD1 cDNA because it arose from the duplicated loci would 
produce a frame shift in the ccoding region of the PKD1 
transcript. Except for BK241, cDNAs in the UN and BK series 

20 that overlap with each other are more identical to themselves 
than to the genomic sequence. 

All sequence assembly was performed using the Staden 
package XBAP (Dear, S. and Staden R. (1991). Nucleic Acid 
Res. 19:3907-3911.) 

25 

11.1.4. PROTEIN HOMOLOGY SEARCHES 
The PKD1 derived amino acid sequence was subjected to 
various sequence analysis methods (Koonin, E.V., Bork, P. and 
Sanders, C. (1994) Yeast chromosome III: new gene functions. 

30 EMBO 13:493-503). For identifying homologues, initial 
(SWISSPROT, PIR, GENPEPT, TREMBL, EMBL, GENBANK, NRDB) 
database searches were performed using the blast series of 
programs (Altschul, S.F. and Lipman, D.J., 1990, Proc. Natl. 
Acad. Sci. USA 82:5509-5513) by applying filter for 

35 compositionally biased regions. (Altschul, S.F. et al., 

1994, Nat. Genet. 6:119-129). By default, the BLOSUM62 amino 
acid exchange matrix was used (Henikoff, S. and Henikoff J.G. 
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(1993) . Proteins 17:97-61) . In order to reveal additional 
candidate preoteins that might be homologous to PKDl , the 
BLOSUM4 5 and PAM240 matrices were also applied. Putative 
homolgoues with a blast p-value below 0 . 1 were studied in 
5 detail. Multiple alignments of the candidate domains were 
carried out using CLUSTALW (Thompson, J.D., Higgins, D.G. and 
Gibson, T. (1994). Nucleic Acid Res. 22:4673-4680) and 
pattern (Rchde, K. and Bork, P. (1993). Comput . Appl . 
Biosci* 9:183-189), motifs and profiles (Grisbskov, M. , 

10 McLachlan, A.D. and Eisenberg, D. (1987) Proc. Natl. Acad. 
Sci. USA 84:4355-4358 were derived. With all these 
constructs interactive database searches were performed. 
Results of these database searches were used for improving 
the multiple alignments that were then used for the next 

15 round of database searches. The final multiple alignment 
containing all retrieved members of a module family was then 
used as input for the secondary structure predictions (Rost, 
B. and Sander, C. (1994). Proteins 19:55-872). 

20 11.1.5. SSCP ANALYSIS 

Single-Stranded Conformational Analysis (SSCP) was 
performed as follows: 50ng of total genomic DNA was 
amplified by PCR. In addition to the genomic DNA, each PCR 
reaction contained 1 picomole of each primer (see below), 0.1 

25 Ml ~ 32p dATP (Amersham) , 0.2 /il in AmpliTaq (Pharmacia), in PCR 
bufefer with a final Mg 2 * of 1.5 mM in a final volume of 20 fil . 
The amplification was performed for 25 cycles, each 
consisting of 94° C. for 30 seconds, 60° C. for 30 seconds, 
and 72° for 60 seconds. 

30 Intronic primers F25 and Mill-IR were utilized for the 

initial SSCP evaluation. The fragment amplified with these 
primers overlaps with the 5' end of KG8. Subsequently, the 
primers F31 and R35 were ussed to amplify the fragment used 
to sequence the PKDl mutation. 
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Primers : F25 ( 5 ' TCGGGGCAGCCTCTTCCTG 3 ' ) ; 

Mill-IR (5' TACAGGGAGGGGCTAGGG 3'); 
F3 1 { 5 ' TGCAACTGCCTCCTGGAGG 3 ' ) 
R3 5 ( 5 ' GGTCTGTCTCTGCTTCCC 3 ' ) 

One microliter of each sample was diluted into loading 

5 dye (95% formamide, 2 0 mM NaOH, 1 mM EDTA, xylene cyanol , 

bromophenol blue) denatured at 98 °C for 5 minutes, cooled on 

ice and loaded onto a 10% (50:1 acrylamide :bisacrylamide) 

polyacrylamide gel containing 10%' glycerol . The gel was run 

at 4°C, 50 watts, for 3 hours. Exposure was overnight on 

10 phosphoimager plates. 

Amplified DNA from the one individual with a variant 

pattern was then reamplified using KG8-F31 and KG8-R35 

primers and the above -described PCR conditions. Both 

reamplified strands were then sequenced using standard 

15 procedures for cycle sequencing of PCR products. 35 P-dCTP 

incorporation was used. 

11.2 RESULTS 
A series of overlapping cosmid clones spanning the 

20 predicted PKD1 region has been described (Germino, G.G., 
Weinstat-Saslow, D., Himmelbauer, H., Gillespie G.A.J. , 
Somlo, S., Wirth, B., Barton, N. , Harris, K.L., Frischauf, 
A.M. and Reeders, S.T. (1992) 1 Genomics, 13:144-151). The 
integrity of the cosmid contig was confirmed by long-range 

25 restriction mapping and genetic linkage analysis of 
polymorphic sequences derived from the cosmids . -Three 
cosmids ( cGGGl , cGGGlO and cDEBll, from centromere to 
telomere) form a contig that includes the 3' end of the 
adjacent gene, TSC2 , (cDEBll) and spans over 80 kilobases 

30 centromeric^ At the proximal end of cGGglO, there is a CpG 
island represented by the Not I site, N54T (FIG* 1A) . 

In order to identify transcripts from the region, the 
cosmid clones were hybridized to a set of five cDNA 
libraries. KG8, a cDNA corresponding to the distal 3.2kb of 

35 the PKD1 sequence (which is located on cosmid cDEBll) , was 
mapped using a panel of somatic cell hybrids, and found to 
hybridize to a single locus on chromosome 16pl3 . Sequence 
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analysis confirmed that KG 8 contains the polyadenylated 3 ' end 
of a gene and has an open reading frame (ORF) of 2100 bp and 
a 1068 bp 3' untranslated region. KG 8 was also found to 
contain a polymorphic (CA) microsatellite repeat. Analysis 
5 of this repeat in a large number of PKD1 kindreds revealed no 
recombination ( supra . ) . 

To obtain clones extending 5' of KG8 , the cosmids cGGGlO 
and cDEBll were hybridized to different cDNA libraries. When 
some of the positive clones obtained from these screens were 

10 analyzed using somatic cell hybrid panels, they were found to 
hybridize strongly to several loci on chromosome 16 in 
addition to the PKD1 region. The restriction maps of the 
hybridizing loci were so similar that it was concluded that a 
series of recent duplications of part of the PKD1 gene had 

15 occurred (excluding the PKD1 region from which the KG 8 cDNA 
is derived) which had given rise to several PKDl-like genomic 
segments. This sequence duplication had been reported by the 
European PKD1 Consortium (EPKDC, 1994, Cell 77:881-894) . 
Preliminary sequence analysis of the cDNA clones 

20 revealed that the PKD1 and. PKDl-like loci give rise to two or 
more transcripts sharing 95-98% sequence identity. Because 
of the high degree of similarity between PKD1 and PKDl-like 
transcripts, therefore, it was not possible to determine the 
correct full-length PKD1 cDNA sequence by simply assembling 

25 overlapping partial cDNA clones* 

.*r To begin to determine the sequence of the authentic PKD1 
transcript, therefore, it was concluded that genomic PKD1 
sequence should be compared to that of the PKD1 specific and 
PKDl-like cDNAs homologous to the genomic sequence. To that 

30 end, the entire cGGGlO cosmid and PKD1 exon-containing parts 
of the cDEBll cosmid were sequenced, as described below. 

11-2.1 SEQUENCE OF THE GENOMIC REGION OF THE PKD1 LOCUS 
The duplicated portion of the PKD1 gene is largely 
35 contained within the cosmid cGGGlO, Prior to sequencing 

cGGGlO, the integrity of the clone was established in several 
ways. First, the restriction map of cGGGlO was compared with 
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map of the genomic DNA from the PKD1 region. Second, 
restriction maps of the overlapping portions of cGGGl and 
cDEBll were compared with cGGGlO. Third, sequences derived 
from cGGGlO and overlapping portions of cDEBll showed 100% 
5 similarity. Finally, a PI phage, PKD1521, was obtained by 
screening a genomic PI library with primers from the TSC2 
gene, which maps near the PKD1 gene. No sequence differences 
were obtained between PKD 1521 and cGGGlO. 

It was necessary to pursue several approaches to obtain 

10 the sequence of cGGGlO (see Section 11 .lV above) .--Brief ly, 
due to the difficulty of sequence certain regions; ' 
modifications to standard automated sequencing chemistries 
had to be made. Both dye terminator and dye primer sequence 
was used, when appropriate, with several different regions. 

15 Further, different polymerases and different meltng and 
polymerization conditions were necessary to optimize the 
quality of the nucleotide sequence. When sequencing across 
the CpG island at the 5' end of the PKD1 gene, in addition to 
modifying the polymerization step, single-stranded templates 

2 0 were used, 

A final ten fold redundancy was achieved for the cGGGlO 
cosmid in order to be able to accurately compare the genomic 
sequence with that of the PKDl specific and PKDl-like cDNAs 
homologous to this cosmid. The cGGGlO sequences were 
25 assembled into three contigs of 8 kb, 2 3 kb arid 4 > 4 kb, 

separated by 1 kb and 2.2 kb gaps* A two-fold redundancy was 
obtained for the cDEBll cosmid, whose sequence was compared 
to PKDl locus specific cDNAs in order to obtain intron/exon 
boundaries of the unique 3 'end of the PKDl gene. 

30 

11.2.2. PKDl and PKDl -LIKE cDNAs 
In order to identify putative coding regions and 
intron/exon boundaries, genomic and cDNA sequences were 
compared. cDNA clones had been identified in two ways. 
35 First, fragments of cosmids cGGGlO and cDEB were hybridized 
to five cDNA libraries. Second, each cDNA clone was 
hybridized to fetal kidney and lymphocyte cDNA libraries to 
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obtain overlapping clones with which to extend the sequence 
(FIG. IB) . 

When the sequences of overlapping cDNAs were assembled, 
a PKD1 trancript length of 14.2 kb was obtained. The 
5 predominant transcript detected by Northern analysis using 
the unique sequence KG 8 probe is approximately 14 kb, 
suggesting that the cDNA clones represent the full-length of 
the PKD1 trancript. 

Restriction and sequence analyses indicate that a CpG 

10 island overlaps the 5' end of the sequence. CpG islands hae 
been* found to mark the 5' ends of many genes. Further, the 
most-5' cDNA clones (UN53, UN54 and UN59) each have identical 
5' ends, providing additional evidence that no upstream PKD1 
exons were missed {see Section 11.1, above). 

15 The multiple cDNAs used to assemble the PKD1 trancript 

along with the genomic sequence are shown in FIGS. 1A and IB. 
By comaring the sequences of overlapping cDNAs and analyzing 
the degree of homology between the different cDNAs and 
genomic sequence, it was possible to distinguish cDNAs 

20 encoded by the authentic PKD1 locus frm those encoded y the 
homologous loci (see Section 11.1, above). The full length 
PKD1 trancript constructed from these exons produces a large 
continuous open reading frame of 12,902 bp. 

-Significant sequence heterogeneity ws observed in these 

25 cDNAs, suggesting that some level of alternative splicing of 
the ^primary PKD1 transcript occurs. For this reason, it was 
sought to isolate a minimum of two cDNAs containing each 
exon, in order to increase the probability that all exons 
contributing to the PKD1 transcript were detected. Formally, 

3 0 however, it remains possible that there exist PKD1 

transcripts which ccontain exons that are not present in the 
cDNA clones samples here. 

Exon 17 was found in two cDNA clones (UN34 and BK156) 
and in the cosmid sequence, but the exon was not incorporated 

35 into the final PKD1 transcript. This is due to a number of 
reasons. First, the cDNA clones in which this exon is found 
differed from the cosmid and are likely to represent PKD1- 
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like genes, rather than the authentic PKD1 gene (see Section 
11.1, above). Second, this exon is not found in FK1, a cDNA 
which was cloned using a PKDl-specif ic probe (KG8) . Finally, 
when included in the full-length cDNA, this exon introduces a 
5 stop codon (743 nucleotides downstream of exon 17) that would 
producce a truncated protein of 2651 amino acid residues. 
Further studies are needed to assess whether this exon may be 
used in diffferent splice combinations in locus-specific 
trancripts. An ADPKD patient with a heterozygous mutation 
10 which introduces a stop codon at position 10,601 of the PKD1 
open reading frame. Other mutations tha truncate the PKD1 
protein have also been reported by the European PKD1 
Consortium. Therefore, it is unlikely that transcripts which 
include exon 17 are predomiant forms in the kidney. 

15 

11.2.3. SEQUENCE ANALYSIS OF THE PREDICTED PKD1 PROTEIN 
The assembly of 46 PKD1 exons yields a predicted 
transcript is 14*2 kb in length with 228 bp nucleotides of 
putative 5' untranslated and 790 nucleotides of 3' 

20 untranslated sequence. The authentic PKD1 transcript differs 
from the reported 3' PKD1 sequence (EPKDC, 1994, Cell 77:881- 
8 94) due to the presence of two extra cytosines at position 
12873 of the PKD1 open reading frame (corresponding to PBP 
position 4563) . This frameshift yielded an erroneous carboxy 

25 PKD1 derived amino acid sequence which contained almost 80 
additional amino acid residues. The presence of the two 
extra cytosines as confirmed with the cosmid sequence derived 
from cDEBll . 

The PKD1 protein derived from the assembled PKD1 
30 transcript is 4304 amino acids in length, with a predicted 
molecular weight of 462 kilodaltons. The nucleotide sequence 
encompassing the Met-1 codon is CTAACGATGC, which represents 
an uncommon translation start site (Kozak, M. (1984) . 
Nucleic Acids Res. 12:857-872). This methionine was 
35 determined to be the putative PKD1 translation start site 
because it is preceded by an in- frame stop codon 63 bases 
upstream. Furthermore, the PKD1 coding region begins with a 

- 87 - 
SUBSTITUTE SHEET (RULE 26) 



WO 95/34573 



PCT/VS95/07079 



23 amino acid region which exhibits many of the properties of 
a signal peptide and corresponding cleavage site (von Hejne, 
G. (1986). Nucleic Acids Res. 14:4683-4690. Welling, L.W. 
Grantham, J.J. (1972). J, Clin. Invest. 51:1063-1075). 
5 In addition to the signal sequence, the identification 

of five domains that have been identified in other proteins 
and a newly discovered domain strongly suggests the 
extracellular location of at least the N-terminal half of the 
protein. Immediately downstream of the signal sequence there 

10 are two leucine-rich repeats (LRRs) (Figure 7A-7B) . These 
LRRs are flanked on both sides by a cysteine rich regions 
which have homology to the flanking regions of a subset of 
other LRRS. LRRs occur in numerous proteins and have been 
shown to be involved in diverse forms of protein-protein 

15 interactions. The number of LRR within the respective 

proteins varies between 2 and 29 (Kobe B. and Deisenhofer J., 
1994, Trends. Biochem, Sci . 19:415-421) . Adhesive platelet 
glycoproteins form the largest group in the LRR superfamily 
(Kobe B. and Deisenhofer J., 1994, Trends. Biochem. Sci. 

20 19:415-421). The structure of the array of 15 LRRs in 
porcine ribonuclease inhibitor (RI) has recently been 
crystallized (Kobe B. and Deisenhofer J., 1995, Nature 
374:183-186); the LRRs of the RI protein form a horseshoe- 
like structure that surrounds RNase A (Kobe B. and 

25 Deisenhofer J,, 1995, Nature 374:183-186). It has been 
suggested that proteins containing only a few LRR, like the 
PKD1 protein, interact with other proteins via the LRRs in 
order to form the horseshoe- like superstructure for protein- 
binding (Kobe B. and Deisenhofer J. # 1994, supra . . 

30 Although LRRs occur in various locations in different 

proteins, the additional flanking cysteine-rich disulfide 
bridge -containing domains, define a subgroup of extracellular 
proteins (Kobe B. and Deisenhofer J. , 1994, supra . Only a 
few proteins have been sequenced so far that contain both, 

35 the distinct N- terminal and C- terminal flanking cysteine-rich 
domains (Figures 7A-7B and 8). Among this group are toll, 
slit, trk, trkB and trkC, which are all involved in cellular 
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signal transduction. For example, the Drosophila toll 
protein is suspected to be involved in either adhesion or 
signaling required to mediate developmental events such as 
dorsal -ventral patterning (Hashimoto, jC. , Hudson, K.L., and 
5 Anderson, K.V., 1988, Cell 52:269-279). The Drosophila slit 
protein is thought to possible mediate interactions between 
growing axons and the surrounding matrix (Rothberg, J.M., 
Jacobs, J.R., Goodman, C.S., and Artavanis-Tsakonas , S., 
1990, Genes and Dev. 4:2169-2187). In vertebrates, these 
10 domains are found in the trk family of tyrosine kinase 

receptors; these proteins may relay cell or matrix adhesive 
events to the cytoplasm via a small carboxy terminal- kinase 
domain (Schneider, R., Schweider, M. , 1991, Oncogene 6:1807- 
11) . it is interesting to note that all of the proteins with 
15 these cysteine-rich domains are involved in extracellular 
function, many of which relate to cell adhesion. For 
example, the platelet glycoproteins I and V help mediate the 
adhesion of platelets to sites of vascular injury. The 5T4 
oncofetal trophoblast glycoprotein appears to be highly 
2 0 expressed in metastatic tumors. 

The PKDl protein also contains a single domain with 
homologies to C-type (calcium- dependent) lectin proteins 
(Figures 7A-7B and 8) . These domains are believed to be 
involved in the extracellular binding of carbohydrate 
25 residues for diverse purposes, including internalization of 
glycosylated- enzyme (asialoglycoprotein receptors) / 
participation in extracellular matrix (versican) and cell 
adhesion (selectins) . The classification of C-type lectins 
has been based on exon organization and the nature and 
30 arrangement of domains within the protein. For example, 
class I (extracellular proteoglycans) and class II (type II 
transmembrane receptors) all have three exons encoding for 
the carbohydrate recognition domain (CRD) ; where as in 
classes III (collectins) and IV (LEC-CAMS) the domains are 
35 encoded by a single exon. The CRD in PKDl C-type lectin 

domain does not fit into the above classification because it 
has a novel combination of protein domains and because it is 
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encoded by two exons (exons 5 and 6, Figure 6A-6P) . Previous 
analysis has failed to establish a correlation between the 
type of. carbohydrate bound to each C-type lectin and the 
primary structure of its CRD. 
5 Exon 10 encodes a LDL-A module (from amino acids 642- 

672, Figures 7A-7B) , a cysteine-rich domain of about 40 amino 
acids in length. This module was originally identified in 
the LDL- receptor but it is also present extracellular 
portions of many other proteins, often in tandem arrays 

10 {Figure 7A-7B) . Because of their hydrophobic nature, these 
domains have been implicated as ligand-binding regions in LDL 
receptor- related protein. Other proteins, like the PKDl 
protein, that contain a single or nontandem LDL-A, include 
the complement proteins (DiScipio, R.G., Gehring, M.R., 

15 Podack, E.R., Kan, C.C. Hugli, T.E., and Fey., G.H., 1984, 
Proc. Natl. Acad. Sci. USA 81:7298-7302), calf enterokinase 
(Kitamoto, Y. , Yan, X.W., McCourt, D.W. and Sadler, J.E., 
1994, Proc. Natl. Acad. Sci . USA 91:7588-7592) and a sarcoma 
virus adhesion protein* 

2 0 In addition to extracellular protein modules that have 

been recognized previously, the PKDl protein a novel domain 
of approximately 70 amino acids in length, present in 14 
copies (Figures 7A-7B and 8) . The first one is encoded by 
expn 5 between the LRRs and the C-type lectin module. The 

25 other PKD domains are consecutively placed starting at amino 
ao&d 1100 and ending at amino acid 2331 and contained in 
exons 13, 14, and 15. Profile and motif searches (see 
Section 11,1, above) identified several other extracellular 
proteins that also contain one or more copes of this novel 

30 domain, which we call the PKD domain. Whereas all known 
extracellular modules seem to be restricted to higher 
organisms, and the few exceptions seem to be evolutionary 
accidents, it was found that the PKD domain in extracellular 
parts of proteins from animals, eubacteria and 

35 archeabacteria . The animal proteins containing an individual 
PKD domain are heavily glycosylated, melanoma-associated cell 
surface proteins , such as melanocyte-specif ic human pmell7 
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(KwonBS., 1993, J. Invest. Derm. (Supplement) 100:134-140), 
the MMP 115 protein (Mochii, M., Agata, K. and Eguchi , G . , 
1991/ Pigment Cell Res. 4:41-47), and the nmb protein 
(Weterman, M.A.J. , Ajubi, N., van Dinter, I- Degen, W. , van 
5 Muijen, G« , Ruiter D.J. and Bloemers, H.P.J. , 1995, Int. J. 
Cancer 60:73-81). The physiological functions of these 
glycoproteins remains to be elucidated. Four enbacterial 
extracellular enzymers, three distinct collagenases and 
lysine-specif ic achromobacter protease I (API) also contain a 

10 single copy of the domain adjacent to their catalytic 

domains. Curiously, the highest degree of similarity between 
the collagenases is in the PKD domain. This may suggest that 
the domain in eukaryotic cells is involved in binding to 
collagenous domains. Four copies of the PKD domain are also 

15 present in the surface layer protein (SlpB) from 

methanothermus . The SlpB protein is (as is the PMEL17 
family) heavily glycosylated and is predicted to be a 
glycoprotein component of the surface layer. 

The PKD domain is predicted to be a globular domain that 

20 contains an antiparallel /3-sheet. Although the PKD domains 
do not contain conserved cysteines, we believe they are 
extracellular domains because: 1) all identified homologues 
are extracellular or the PKD domain is in the extracellular 
part; 2) the first domain (amino add 281-353) is located 

25 between other known extracellular modules; and 3) there are 
no predicted transmembrane regions between the other 
identified (extracellular) modules and the 13 remaining FKD 
domains. Whereas the PKD domains in SlpB are very similar, 
pointing to rather recent duplication; the 14 domains in PKD1 

30 are rather divergent. Even the most conserved (WDFGDG) motif 
(Fig. 7A-7B) is considerably modified in some of the PKD 
domains. Therefore, it is unlikely that unequal 
recombination between genomic sequences for motifs is a 
common source of mutations in this disease, 

35 Although, it was not possible to identify specific 

domains in the C- terminal half of the protein, a long region 
was found which contained similarity to a putative C. elegans 
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Chromosome III protein (accession number Z48544). A 
hydrophobic stretch of 60 amino acids from 3 986 to 4 04 5 might 
represent a possible transmembrane domain, but without any 
clear resemblance to other such domains. 

5 

11.2.4. IDENTIFICATION OF AN 

ADPKD- CAUSING MUTATION 

SSCP analysis was performed on. samples obtained from 60 
patients, as described, above, in Section 10.1. One variant 

1Q ADPKD. individual was identified via SSCP . Upon 

reampiif ication of amplified DNA from this individual (see 
Sectxpn. 10.1, above) , it was revealed that the patient 
contained a C to T transition at base pair 10,601 (exon 32) 
of the full-length PKD1 transcript. This mutation created a 

1S stop codon (TAG) at PKD1 amino acid position 765 which 

previously coded for a glutamine (CAG) , thus truncating the 
final 728 amino acid residues which are' normally present at 
the carboxy end of the PKD1 protein and yielding a final 
mutant protein of 3576 amino acids. The mutation was also 

2Q predicted to create a novel Sty-1 site (CCCTAG) ; genomic DNA 
spanning this exon was amplified as before from the patient, 
his parents, and over 60 other unrelated individuals (120 
alleles) . After Sty-1 digestion, only the patient ZC (#118) 
was heterozygous for an enzyme site* The absence of the 

25 sequence change in over 120 alleses establishes this is not a 
polymorphic variation. The absence of the site in either 
parent establishes this as a new" mutation, which corelates 
with the appearance of disease. Finally, the predicted 
impact on the protein (truncation) by itself is highly 

2 0 suggestive that it would impair or alter its function. This 
evidence, even in the absence of examination of the remainer 
of the gene or transcript in this patient, would be 
considered generally to be sufficient proof that this 
mutation is the cause of the disease. 
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12 . DEPOSIT OF MICROORGANISMS 
The following microorganisms were deposited with the 
American Type Culture Collection, Rockville, Maryland on May 
27, 1994 and assigned the indicated accession numbers: 
5 Microorganism ATCC Accession No. 

KG 8 69636 
cGGGlO 69634 
cDEBll 69635 

10 The present invention is not to be limited in scope by 

the specific embodiments described which are intended as 
single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are within 
the scope of the invention. Indeed, various modifications of 

15 the invention, in addition to those shown and described 

herein will become apparent to those skilled in the art from 
the foregoing description and accompanying drawings. Such 
modifications are intended to fall within the scope of the 
appended claims. 

20 
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WHAT IS CLAIMED IS ; 

1. An isolated nucleic acid containing a nucleotide 
sequence which encodes a polycystic kidney disease (PKD1) 

5 gene product . 

2. The isolated nucleic acid of Claim 1 which encodes 
the amino acid sequence (SEQ ID NO: 2) of the PKD1 gene 
product depicted in FIG. 6. 

10 

3. The isolated nucleic acid of Claim 1 wherein the 
nucleotide sequence is the nucleotide sequence (SEQ ID NO: 1) 
depicted in Fig. 6. 

15 4, The isolated nucleic acid Claim 1 which hybridizes 

under stringent conditions to the complement of the coding 
sequence of the nucleotide sequence depicted in FIG. 6 (SEQ 
ID NO: 1) , or which hybridizes under less stringent 
conditions and encodes a functionally equivalent PKD1 gene 

20 product. 

5. A nucleic acid vector containing the nucleotide 
sequence of Claim 1, 2, 3 or 4, 

25 ^6. An expression vector containing the nucleotide 

sequence of Claim 1, 2, 3 or 4 in operative association with 
a nucleotide regulatory element 'that controls expression of 
the nucleotide sequence in a host cell. 

30 7. An antisense molecule containing the nucleotide 

sequence of Claim 4. 

8 . A ribozyme molecule containing the nucleotide 
sequence of Claim 4. 

35 

9, A triple helix molecule containing the nucleotide 
sequence of Claim 4, 
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10. The nucleotide vector of Claim 5 which is a plasmid 



vector * 



11 . 



The nucleotide vector of Claim 5 which is a viral 



5 vector. 



12. A genetically engineered host cell containing the 
nucleotide sequence of Claim 1, 2,. 3 or 4. 



10 



13. A genetically engineered host cell containing the 
nucleotide sequence of Claim 1, 2, 3 or 4 in operative 
association with a regulatory element that controls. ... 
expression of the nucleotide sequence in the host cell. 



15 



14 . 



A substantially pure PKD1 gene product. 



15. The substantially pure PKD1 gene product of Claim 14 
wherein the gene product contains the amino acid sequence 
(SEQ ID NO: 2) depicted in FIG. 6, 



15. An antibody that immunospecif ically binds to a PKD1 
gene product* 

16, A method for diagnosing autosomal dominant 

25 polycystic kidney disease, comprising detecting a mutant PKD1 
gene or gene product in a patient sample . 

17. A method for treating autosomal dominant polycystic 
kidney disease, comprising administering an effective amount 

30 of a compound to a patient in need of such treatment, which 
compound inhibits the synthesis, expression or activity of a 
mutant PKD1 gene product. 

18, The method of Claim 17 in which the compound is an 
35 antisense or ribozyme molecule that blocks translation of 

mutant PKD1 mRNA. 
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19. The method of Claim 18 in which the compound is a 
nucleotide that is complementary to the 5' region of the PKD1 
gene, and blocks transcription of the PKD1 gene via triple 
helix formation. 

~- 

5 

20. The method of Claim 19 further comprising replacing 
the mutant PKD1 gene with a normal allele, or replacing the 
mutant PKD1 gene product with a normal PKD1 gene product. 

10 —21. The method of Claim 19 in which the compound is an 
antibody that immunospecif ically binds and inactivates the 
mutant PKD1 gene product . 

22. A method for treating autosomal dominant polycystic 
15 kidney disease, comprising administering a normal allele of 

the PKD1 gene to a patient in need of such treatment, so that 
the normal PKD1 allele is expressed in the patient, 

23. A method for treating autosomal dominant polycystic 
2 0 kidney disease, comprising administering an effective amount 

of a normal PKD1 gene product to a patient in need of such 
therapy. 

24 . A method of measuring the presence of a PKD1 gene 
25 product in a sample, comprising: 

(a) contacting the sample suspected of containing 
a PKD1 gene product with an antibody that 
binds to the PKD1 gene product under 
conditions which allow for the formation of 

30 reaction complexes comprising the antibody and 

the PKD1 gene product; 

(b) detecting the formation of reaction complexes 
comprising the antibody and PKD1 gene product 
in the sample, in which detection of the 

35 formation of reaction complexes indicates the 

presence of the PKD1 gene product in the 
sample - 
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25. The method of Claim 24 in which the antibody is 
bound to a solid phase support. 

26. The method of Claim 24 in wiiich the PKD1 gene 
5 product is bound to a solid phase support. 

27. The method of Claim 25 or 26 which additionally 
comprises contacting the sample with a labeled PKD1 gene 
product in step (a) , and removing unbound substances prior to 

10 step (b) , in which a decrease in the amount of reaction 

complexes comprising the antibody and the labelled PKD1 gene 
product indicates the presence of the PKD1 gene product in - 
the sample. 

15 28. A method of evaluating the level of PKD1 gene 

product in a biological sample comprising: 

(a) detecting the formation of reaction complexes 
in a biological sample according to the method 
of Claim 24; and 
20 (b) evaluating the amount of reaction complexes 

formed, which amount of '-reaction complexes 
corresponds to the level of PKD1 gene product 
in the biological sample . 

A method of detecting or diagnosing the presence of 
associated with elevated or decreased levels of 
product in a mammalian subject comprising: 

(a) evaluating the level of PKD1 gene product in a 
biological sample from mammalian subject 
according to Claim 28; and 

(b) comparing the level detected in step (a) to a 
level of PKD1 gene product present in normal 
subjects or in the subject at an earlier time, 
in which an increase or a decrease in the 
level of the PKD1 gene product as compared to 
normal levels indicates a disease condition. 



25 29. 
a disease 
PKD1 gene 

30 
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30. A method for monitoring a therapeutic treatment of 
a disease associated with elevated or decreased levels of 
PKD1 gene product in a mammalian subject, comprising 
evaluating the levels of the PKD1 gene product in a series of 
5 biological samples obtained at different time points from a 
mammalian subject undergoing a therapeutic treatment for a 
disease associated with elevated or decreased levels of PKDl 
gene product, according to the method of Claim 28. 

10 31. The method according to Claim 29 or 30 wherein the 

disease associated with decreased levels of PKDl gene product 
is selected from the group consisting of polycystic kidney 
disease, and acquired cystic disease. 

!5 32. A test kit for measuring the presence of or amount 

of PKDl gene product in a sample, comprising 

(a) an antibody that immunospecif ically binds to a 
PKDl gene product; 

(b) means for detecting binding of the anti-PKDl 
20 gene product antibody to PKDl gene product in 

a sample; 

(c) other reagents; and 

(d) directions for use of the kit. 

25 33. A pharmaceutical composition for treating 

polycystic kidney disease in a mammal, comprising the PKDl 
gene product of Claim 14 and a pharmaceut ically acceptable 
carrier. 

30 34. A method for treating polycystic kidney disease in 

a mammal comprising administering an amount of a 
pharmaceutical composition of Claim 33 effective to 
ameliorate the symptoms of polycystic kidney disease. 

35 35. A method for treating polycystic kidney disease in 

a mammal comprising increasing the expression of a protein 
encoded by the nucleic acid of Claim 1, 2, 3 or 4 . 
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1 ATG CCG CCC GCC GCG CCC GCC CGC CTG GCG CTG GCC CTG GGC CTG GGC CTG TGG CTC GGG 60 

1 MPPAAPARLALALGIGLWLG 20 

61 GCG CTG GCG GGG GGG CCC GGG CGC GGC TGC GGG CCC TGC GAG CCC CCC TGC CTC TGC GGG 120 

21 ALAGGPGRGCGPCEPPCLCG 40 

121 CCA GCG CCC GGC GCC GCC TGC CGC GTC AAC TGC TCG GGC CGC GGG CTG CGG ACG CTC GGT 180 

41 PAPGAACRVNCSGRG LRTLG 60 

181 CCC GCG CTG CGC ATC CCC GCG GAC GCC ACA GAG CTA GAC GTC TCC CAC AAC CTG CTC CGG 240 

61 PALRIPAOATELOVSHNL LR 80 

241 GCG CTG GAC GTT GGG CTC CTG GCG AAC CTC TCG GCG CTG GCA GAG CTG GAT ATA AGC AAC 300 

81 A L D V G L L A N L S A L A E L 0 I S N 100 

301 AAC AAG ATT TCT ACG TTA GAA GAA GGA ATA TTT GCT AAT TTA TTT AAT TTA AGT GAA ATA 360 

101 NK I STLEEG 1FANLFNL SE I 120 

361 AAC CTG AGT GGG AAC CCG TTT GAG TGT GAC TGT GGC CTG GCG TGG CTG CCG CAA TGG GCG 420 

121 NLSGNPFECDCGLAWLPQWA 140 

421 GAG GAG CAG CAG GTG CGG GTG GTG CAG CCC GAG GCA GCC ACG TGT GCT GGG CCT GGC TCC 480 
141EEQQVRVVQPEAA. TCAGPGS 160 

481 CTG GCT GGC CAG CCT CTG CTT GGC ATC CCC TTG CTG GAC AGT GGC TGT GGT GAG GAG TAT 540 

161 LAGQPLLG IPL L DSGCGEEY 180 

541 GTC GCC TGC CTC CCT GAC AAC AGC TCA GGC ACC GTG GCA GCA GTG TCC TTT TCA GCT GCC 600 

181 V A C L P D N S S G T V A A V S F S A A 200 

601 CAC GAA GGC CTG CTT CAG CCA GAG GCC TGC AGC GCC TTC TGC TTC TCC ACC GGC CAG GGC 660 

201 HEGLLQPEACSAFCFSTGQG 220 

661 CTC GCA GCC CTC TCG GAG CAG GGC TGG TGC CTG TGT GGG GCG GCC CAG CCC TCC AGT GCC 720 

221 LAALSEQGWCLCGA AQPSSA 240 

721 TCC TTT GCC TGC CTG TCC CTC TGC TCC GGG CCC CCG GCA CCT CCT GCC CCC ACC TGT AGG 780 

241 SFACLSLCSGPPAPPAPTCR 260 

781 GGC CCC ACC CTC CTC CAG CAC GTC TTC CCT GCC TCC CCA GGG GCC ACC CTG GTG GGG CCC 840 

261 GPTL.LQHVFPASPGATLVGP 280 
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841 CAC GGA CCT CTG GCC TCT GGC CAG CTA GCA GCC TTC CAC ATC GCT GCC CCG CTC CCT GTC 900 

281HG PL ASGQL AAF H I AAP L PV 300 

901 ACT GAC ACA CGC TCG GAC TTC GGA GAC GGC TCC GCC GAG GTG GAT GCC GCT GGG CCG GCT 960 

301 TOTRWDFGOGSAEVOAA'GPA 320 

961 GCC TCG CAT CGC TAT GTG CTG CCT GGG CGC TAT CAC GTG ACG GCC GTG CTG GCC CTG GGG 1020 

321 ASHR YVL PGRY HVTAVL A L G 340 

1021 GCC GGC TCA GCC CTG CTG GGG ACA GAC GTG CAG GTG GAA CCG GCA CCT GCC GCC CTG GAG 1080 

341 ACSALLGTDVQVEAAPAAL E 360 

1081 CTC GTG TGC CCG TCC TCG GTG CAG AGT GAC GAG AGC CTC GAC CTC AGC ATC CAG AAC CGC 1140 

361 LVCPSSVQSOESLOLS 1QNR 380 

1141 GGT GGT TCA GGC CTG GAG GCC GCC TAC AGC ATC GTG GCC CTG GGC GAG GAG CCG GCC CGA 1200 

381 GGSGlEAAYSIV AL GEEPA R 400 

1201 GCG GTG CAC CCG CTC TGC CCC TCG GAC ACG GAG ATC TTC CCT GGC AAC GGG CAC TGC TAC 1260 

401 AVHPLCPSOTE IfPGNGHCY 420 

1261 CGC CTG GTG GTG GAG AAG GCG GCC TGG CTG CAG GCG CAG GAG CAG TGT CAG GCC TGG GCC 1320 

421RLVVEKAAW LQAQEQCQAWA 440 

1321 GGG GCC GCC CTG GCA ATG GTG GAC AGT CCC GCC GTG CAG CGC TTC CTG GTC TCC CGG GTC 1380 

441 G A A L AM VD S P AVQRF L V S R V 460 

1381 ACC AGG AGC CTA GAC GTG TGG ATC GGC TTC TCG ACT GTG CAG GGG GTG GAG GTG GGC CCA 1440 

461 TRSL0VW1GFSTV0GVEVGP 480 

1441 GCG CCG CAG GGC GAG GCC TTC AGC CTG GAtf AGC TGC CAG AAC TGG CTG CCC GGG GAG CCA 1500 

481 APOGEAFSLESCONW LPGE P 500 

1501 CAC CCA GCC ACA GCC GAG CAC TGC GTC CGG CTC GGG CCC ACC GGG TGG TGT AAC ACC GAC 1560 

501 HPATAEHCVRLGPTGWCNTD 520 

1561 CTG TGC TCA GCG CCG CAC AGC TAC GTC TGC GAG CTG CAG CCC GGA GGC CCA GTG CAG GAT 1620 

521 LCSAPH SYVCELQPGGPVQD 540 

1621 GCC GAG AAC CTC CTC GTG GGA GCG CCC ACT GGG GAC CTG CAG GGA CCC CTG ACG CCT CTG 1680 

541 AENLLVGAPSGOLQGPLTPL 
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1681 CCA CAG CAG GAC GGC CTC TCA GCC CCG CAC GAG CCC GTG GAG GTC ATG CIA TTC CCG GGC 1740 
561 AQQDGlSAPHEPVEVMVFPG 580 

1741 CTG CGT CTG AGC CGT GAA GCC TTC CTC ACC ACG GCC GAA TTT GGG ACC CAG GAG CTC CCG 1800 
581 L R L S R E A F L T T A E* T G T Q E I R 600 

1801 CGG CCC GCC CAG CTG CGG CTG CAG GTG TAC CGG.CTC CTC AGC ACA GCA GGG ACC CCG GAG 1860 
601 RPAQLRLQVYR LLSTAGTPE 620 

1861 AAC GGC AGC GAG CCT GAG AGC AGG TCC CCG GAC AAC AGG ACC CAG CTG GCC CCC GCG TGC 1920 
621 N G S E P E S R S P D N R T .,0 L A P A C 640 

1921 ATG CCA GGG GGA CGC TGG TGC CCT GGA GCC AAC ATC TGC TTG CCG CTG GAC GCC TCC TGC 1980 
641 MPGGRWC PGANICLPLD A SC 660 

1981 CAC CCC CAG GCC TGC GCC AAT GGC TGC ACG TCA GGG CCA GGG CTA CCC GGG GCC CCC TAT 2040 
661 H P 0 A C A N G C T S G P G L P G A P Y 680 

2041 GCG CTA TGG AGA GAG TTC CTC TTC TCC GTT CCC GCG GGG CCC CCC GCG CAG TAC TCG GTC 2100 
681 ALW REF LFSVPAGPPAQYSV 700 

2101 ACC CTC CAC GGC CAG GAT GTC CTC ATG CTC CCT GGT GAC CTC GTT GGC TTG CAG CAC GAC 2160 
701 TLHGQDVLMLPGDLVGLQHD 720 

2161 GCT GGC CCT GGC GCC CTC CTG CAC TGC TCG CCG GCT CCC GGC CAC CCT GGT CCC CGG GCC 2220 
721 A G P G A L I H C S P A P G H P G P R A 740 

2221 CCG TAC CTC TCC GCC AAC GCC TCG TCA TGG CTG CCC CAC TTG CCA GCC CAG CTG GAG GGC 2280 
™ P Y L S A N A S S W L P H I P A Q L E G 760 

2281 ACT TGG CGC TGC CCT GCC TGT GCC CTG CGG CTG CTT GCA CAA CGG GAA CAG CTC ACC GTG 2340 
761 TWGCPACALRL LAQREQLTV 780 

2341 CTG CTG GGC TTG AGG CCC AAC CCT GGA CTG CGG CTG CCT GGG CGC TAT GAG GTC CGG GCA 2400 
781 LLGLRPNPGLRL PGR YEVRA 800 

2401 GAG GTG GGC AAT GGC GTG TCC AGG CAC AAC CTC TCC TGC AGC TTT GAC GTG GTC TCC CCA 2460 
801 E V G N G V S R H N L S C S F 0 V V S P 820 

2461 GTG GCT GGG CTG CGG GTC ATC TAC CCT GCC CCC CGC GAC GGC CGC CTC TAC GTG CCC ACC 2520 
821 VAGLRV I YPAPROGRLYVPT 840 
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2521 AAC GGC TCA GCC TTG GTG CTC CAG GTG GAC TCT GGT GCC AAC GCC ACG GCC ACG GCT CGC 2580 

841 N G S A L V I Q V D S G A N A T A T A R 860 

2581 TGG CCT GGG GGC ACT CTC AGC GCC CGC TTT GAG AAT GTC TGC CCT GCC CTG GTG GCC ACC 2640 

861 WPGGSL SARFENVCPAL VAT 880 

2641 TTC GTG CCC GCC TGC CCC TGG GAG ACC AAC GAT ACC CTG TTC TCA GTG GTA GCA CTG CCG 2700 

881 F VPACPWE TND.TL FSVVAL P 900 

2701 TGG CTC AGT GAG GGG GAG CAC GTG GTG GAC GTG GTG GTG GAA AAC AGC GCC AGC CCG GCC 2760 

901 W L S E G E H V V D V V V E N S A S R A 920 

276f AAC CTC AGC CTG CGG GTG ACG GCG GAG GAG CCC ATC TGT GGC CTC CGC GCC ACG CCC AGC 2820 

921 NLSLRVTAEEP1CGLRA. TPS 940 

2821 CCC GAG GCC CGT GTA CTG CAG GGA GTC CTA GTG AGG TAC AGC CCC GTG GTG GAG GCC GGC 2880 

941 P E A R V L Q G V L V R Y S P V V E A G 960 

2881 TCG GAC ATG GTC TTC CGG TGG ACC ATC AAC GAC AAG CAG TO CTG ACC TTC CAG AAC GTG 2940 

961 SDMVFRWT INDKQSITF QNV 980 

2941 GTC TTC AAT GTC ATT TAT CAG AGC GCG GCG GTC TTC AAG CTC TCA CTG ACG GCC TCC AAC 3000 

981 VFNVI YQS AAVFK L SLTASN 1000 

3001 CAC GTG AGC AAC GTC ACC GTG AAC TAC AAC GTA ACC GTG GAG CGG ATG AAC AGG ATG CAG 3060 

1001 HVSNVTVNY N VTVERMNRMO 1020 

3061* GGT CTG CAG GTC TCC ACA GTG CCG GCC GTG CTG TCC CCC AAT GCC ACG CTA GCA CTG ACG 3120 

1021 GLQVSTVPAVLSPNATLALT 1040 

3121 GCG GGC GTG CTG GTG GAC TCG GCC GTG GAG GTG GCC TTC CTG TGG ACC TTT GGG GAT GGG 3180 

1041 AGVLVDSAVEVAFLWTFGDG 1060 

3181 GAG CAG GCC CTC CAC CAG TTC CAG CCT CCG TAC AAC GAG TCC TTC CCA GTT CCA GAC CCC 3240 

1061 EQA L HQF OPPYNE S F PVPD P 1080 

3241 TCG GTG GCC CAG GTG CTG GTG GAG CAC AAT GTC ACG CAC ACC TAC GCT GCC CCA GGT GAG 3300 

1081 S V A Q V L V E H N V T H T Y A A P G E 1100 
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3301 TAC CTC CTG ACC GTG CTG GCA TCI AAT GCC TTC GAG AAC CTG ACG CAG CAG GTG CCT GTG 3360 
1101 YLL TVL ASNAFENL TQQVPV 1120 

3361 AGC GTG CGC GCC TCC CTG CCC TCC GTG GCT GTG GGT GTG AGT GAC GGC GTC CTG GTG GCC 3420 
1121 SVRASLPSVAVGVSDGVL VA 1140 

3421 GGC CGG CCC GTC ACC TTC TAC CCG CAC CCG CTG CCC TCG CCT GGG GGT GTT CTT TAC ACG 3480 
1141 G RPV TF YPHPL P SPGG VL Y T 1 160 

3481 TGG GAC TTC GGG GAC GGC TCC CCT GTC CTG ACC CAG AGC CAG CCG GCT GCC AAC CAC ACC 3540 
1161 WDFGDGSPVl TQSQPAANHT 1180 

3541 TAT GCC TCG AGG GGC ACC TAC CAC GTG CGC CTG GAG GTC AAC AAC ACG GTG AGC GGT GCG 3600 
1181 YASR GTYHVRLEVNNTVS.GA 1200 

3601 GCG GCC CAG GCG GAT GTG CGC GTC TTT GAG GAG CTC CGC GGA CTC AGC GTG GAC ATG AGC 3660 
1201 A A Q A D V R V F E E L R G L S V 0 M S 1220 

3661 CTG GCC GTG GAG CAG GGC GCC CCC GTG GTG GTC AGC GCC GCG GTG CAG ACG GGC GAC AAC 3720 
1221 LAVEQGAPVVVSAAVQTGDN 1240 

3721 ATC ACG TGG ACC TTC GAC ATG GGG GAC GGC ACC GTG CTG TCG GGC CCG GAG GCA ACA GTG 3780 
1241 ! T W T F 0 M G 0 G T V I S G P E A T V 1260 

3781 GAG CAT GTG TAC CTG CGG GCA CAG AAC TGC ACA GTG ACC GTG GGT GCG GGC AGC CCC GCC 3840 
1261 EHVYLRAQNC T VTVGAGSPA 1280 

3841 GGC CAC CTG GCC CGG AGC CTG CAC GTG CTG GTC TTC GTC CTG GAG GTG CTG CGC GTT GAA 3900 
1281 GH L ARS L HVL VFVL E VL R VE 1300 

3901 (XX GCC GCC TGC ATC CCC ACG CAG CCT GAC GCG CGG CTC ACG GCC TAC GTC ACC GGG AAC 3960 
1301 P A A C I P T Q P D A R L T A Y V T G N 1320 

3961 CCG GCC CAC TAC CTC TTC GAC TGG ACC TTC GGG GAT GGC TCC TCC AAC ACG ACC GTG CGG 4020 
1321 P A H Y L F D W T F G 0 G S S N T T V R 1340 

4021 GGG TGC CCG ACG GTG ACA CAC AAC TTC ACG CGG AGC GGC ACG TTC CCC CTG GCG CTG GTG 4080 
1341 GCPTVTHNFTRSGTFPLALV 1360 
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4081 CTG TCC AGC CGC GTG MC AGG GCG CAT TAC TTC ACC AGC ATC TGC GTG GAG CCA GAG GTG 4140 

1361 LSSRVNRAHYFTS ICVEPEV 1380 

4141 GGC AAC GTC ACC CTG CAG CCA GAG AGG CAG TTT GTG CAG CTC GGG GAC GAG GCC TGG CTG 4200 

1381 GNVT LQPERQFVQLGD E AWL 1400 

4201 GTG GCA TGT GCC TGG CCC CCG TTC CCC TAC CGC TAC ACC TGG GAC TTT GGC ACC GAG GAA 4260 

1401 VACAWPPFPYR YTWDFG T£ E 1420 

4261 GCC GCC CCC ACC CGT GCC AGG GGC CCT GAG GTG ACG TTC, ATC TAC CGA GAC CCA GGC TCC 4320 

1421 A APTRARGPEVTF 1 Y RD PGS 1440 

4321 %T CTT GTG ACA GTC ACC GCG TCC AAC AAC ATC TCT GCT GCC AAT GAC TCA GCC CTG GTG 4380 

1441 YL V TVTASNN 1 SAANDS AL V 1460 

4381 GAG GTG CAG GAG CCC GTG CTG GTC ACC AGC ATC AAG GTC AAT GGC TCC CTT GGG CTG GAG 4440 

1461 EVQEPVLVTSIKVN GS'LGLE 1480 

4441 CTG CAG CAG CCG TAC CTG TTC TCT GCT GTG GGC CGT GGG CGC CCC GCC AGC TAC CTG TGG 4500 

1481 LQQPYIFSAV GRGRPASYLW 1500 

4501 GAT CTG GGG GAC GGT GGG TGG CTC GAG GGT CCG GAG GTC ACC CAC GCT TAC AAC AGC ACA 4560 

1501 DLGDGGWLEGPEVTHAYNST 1520 

4561 GGT GAC TTC ACC GTT AGG GTG GCC GGC TGG AAT GAG GTG AGC CGC AGC GAG GCC TGG CTC 4620 

1521 G OF T VRVAGW N EVSRS E AW L 1540 

4621 AAT GTG ACG GTG AAG CGG CGC GTG CCG GGG CTC GTC GTC AAT GCA AGC CGC ACG GTG GTG 4680 

1541 N VTVKRRVRGLVVNASRTVV 1560 

4681 CCC CTG AAT GGG AGC GTG AGC TTC AGC ACG TOG CTG GAG GCC GGC AGT GAT GTG CGC TAT 4740 

1561PLNGSVSfSTSLEAGSDVRY 1580 

4741 TCC TGG GTG CTC TGT GAC CGC TGC ACG CCC ATC CCT GGG GGT CCT ACC ATC TCT TAC ACC 4800 

1581 SWVLCDRCTPIPGGPT ISYT 1600 

4801 TTC CGC TCC GTG GGC ACC TTC AAT ATC ATC GTC ACG GCT GAG AAC GAG GTG GGC TCC GCC 4860 

1601 FRSVGTFNI IVTAENEVGSA 1620 
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4861 CAG GAC AGC ATC TTC GTC TAT GTC CTG CAG CTC ATA GAG GGG CTG CAG GTG GTG GGC GGT 4920 

1621 QD-SIFVYVLQLIEGLQVVGC 1640 

4921 GGC CGC TAC TTC CCC ACC AAC CAC ACG GTA CAG CTG CAG'GCC GTG GTT AGG GAT GGC ACC 4980 

1641 GRYFPTNHTVQLQAVVRO-GT 1660 

4981 AAC GTC TCC TAC AGC TGG ACT GCC TGG AGG GAC AGG GGC CCG GCC CTG GCC GGC AGC GGC 5040 

1661 NVSYSWTAWRDRGPAL AGSG 1680 

5041 AAA GGC TTC TCG CTC ACC GTG CTC GAG GCC GGC ACC TAC CAT GTG CAG CTG CGG GCC ACC 5100 

1681 K G F S L T V L E AG T Y HV Q L R A T 1700 

5101 AAC ATG CTG GGC AGC GCC TGG GCC GAC TGC ACC ATG GAC TTC GTG GAG CCT GTG GGG TGG 5160 

1701 NMLGSAWAOCTMDFVE PVGW 1720 

5161 CTG ATG GTG GCC GCC TCC CCG AAC CCA GCT GCC GTC AAC ACA AGC GTC ACC CTC AGT GCC 5220 

1721 LMVAASPNPAAVNTS V TLSA 1740 

5221 GAG CTG GCT GGT GGC AGT GGT GTC GTA TAC ACT TGG TCC TTG GAG GAG GGG CTG AGC TGG 5280 

1741 ELAGGSGVVYTWSLEEGLSW 1760 

5281 GAG ACC TCC GAG CCA TTT ACC ACC CAT AGC TTC CCC ACA CCC GGC CTG CAC TTG GTC ACC 5340 

1761 ETSEPFTTHS FPTPGLHIVT 1780 

5341 ATG ACG GCA GGG AAC CCG CTG GGC TCA GCC AAC GCC ACC GTG GAA GTG GAT GTG CAG GTG 5400 

1781 MT AGNPLGS ANATVEVOVQV 1800 

5401 CCT GTG AGT GGC CTC AGC ATC AGG GCC AGC GAG CCC GGA GGC AGC TTC GTG GOG GCC GGG 5460 

1801 PVSGLS I RASEPGGSF V A A G 1820 

5461 TCC TCT GTG CCC TTT TGG GGG CAG CTG GCC ACG GGC ACC AAT GTG AGC TGG TGC TGG GCT 5520 

1821 SSVPFWGQLATGTNVSWCWA 1840 

5521 GTG CCC GGC GGC AGC AGC AAG CGT GGC CCT CAT GTC ACC ATG GTC TTC CCG GAT GCT GGC 5580 

1841 VPGGS-SKRGPHVTMVFPOAG 1860 

5581 ACC TTC TCC ATC CGG CTC AAT GCC TCC AAC GCA GTC AGC TGG GTC TCA GCC ACG TAC AAC 5640 

1861 TF S I RLN'ASNAVSWVSATYN 1880 
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5641 CTC ACG GCG GAG GAG CCC ATC GTG GGC CTG GTG CTG TGG GCC.AGC AGC AAG GTG GTG GCG 5700 

1881 L TAEEP I VGLVLWASSKVVA 1900 

5701 CCC GGG CAG CTG GTC CAT TTT CAG ATC CTG CTG GCT GCG GGC TCA GCT GTC ACC TTC CCC 5760 

1901 PGOLVHFQILLAAGSAVT.fr 1920 

5761 CTA CAG GTC GGC GGG GCC AAC CCC GAG GTG CTC CCC GGG CCC CGT TTC ICC CAC AGC TTC 5820 

1921 LQVGGANPEVLPGPRFSHSF 1940 

5821 CCC CGC GTC GGA GAC CAC GTG GTG AGC GTG CGG GGC AAA AAC CAC GTG AGC TGG GCC CAG 5880 

1941 P 5 - R V G D H V V S V R G K N H V S W A 0 I960 

5881 GCG CAG GTG CGC ATC GTG GTG CTG GAG GCC GTG AGT GGG CTG CAG GTG CCC AAC TGC TGC 5940 

1961 AQVRIVVLEAVSGLQVPNCC 1980 

5941 GAG CCT GGC ATC GCC ACG GGC ACT GAG AGG AAC TTC ACA GCC CGC GTG CAG CGC GGC TCT 6000 

1981 £ PG I A TG T E RNF T A R VG RG S 2000 

6001 CGG GTC GCC TAC GCC TGG TAC TTC TOG CTG CAG AAG GTC CAG GGC GAC TCG CTG GTC ATC 6060 

2001 RVA'YAWYF SLQKVQGOSL V I 2020 

6061 CTG TCG GGC CGC GAC GTC ACC TAC ACG CCC GTG GCC GCG GGG CTG TTG GAG ATC CAG GTG 6120 

2021 L SGROVTYT-PVAAGL L E I QV 2040 

6121 CGC GCC TTC AAC GCC CTG GGC AGT GAG AAC CGC ACG CTG GTG CTG GAG GTT CAG GAC GCC 6180 

2041 RAFNALGSE N. RTLVlEVQDA 2060 

6181 GTC CAG TAT GTG GCC CTG CAG AGC GGC CCC TGC TTC ACC AAC CGC TCG GCG CAG TTT GAG 6240 

2061 V ; Q Y V A L Q S G P C F T N R S A Q F E 2080 

6241 GCC GCC ACC AGC CCC AGC CCC CGG CGT GTG GCC TAC CAC TGG GAC TTT GGG GAT GGG TCG 6300 

2081 AAT SPSPRRVA YHWOF GDG S 210O 

6301 CCA GGG CAG GAC ACA GAT GAG CX AGG GCC GAG CAC TCC TAC CTG AGG CCT GGG GAC TAC 6360 

2101 PGQD TDE PRAEHSYLRPG DY 2120 

6361 CGC GTG CAG GTG AAC GCC TCC AAC CTG GTG AGC TTC TTC GTG GCG CAG GCC ACG GTG ACC 6420 

2121 RVQ VNASNLVSFFVAQATVT 2140 
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6421 GTC CAG GTG CTG GCC TGC CGG GAG CCG GAG GTG GAC GTG GTC CTG CCC CTG CAG GTG CTG 6480 

2141 VQVLACREPEVO.VVLPLQVL 2160 

6481 ATG CGG CGA TCA CAG CGC MC TAC TTG GAG GCC CAC GTT GAC CTG CGC GAC TGC GTC ACC 6540 

2161 MRRSQRNYLE AHVDLRD CVT 2180 

6541 TAC CAG ACT GAG TAC CGC TGG GAG GTG TAT CGC ACC GCC AGC TGC CAG CGG CCG GGG CGC 6600 

2181 YQ T E YRWE VY R-TAS C-QR P G R 2200 

6601 CCA GCG CGT GTG GCC CTG CCC GGC GTG GAC GTG AGC CGG CCT CGG CTG GTG CTG CCG CGG 6660 

2201 PARVALPGVD VSRPRLVI PR 2220 

6661 CTG GCG CTG CCT GTG GGG CAC TAC TGC TTT GTG TTT GTC GTG TCA TTT GGG GAC ACG CCA 6720 

2221 L A L PVGHYCF VFVVSFG O T P 2240 

6721 CTG ACA CAG AGC ATC CAG GCC AAT GTG ACG GTG GCC CCC GAG CGC CTG GTG CCC ATC ATT 6780 

2241 LTQS I QANVTVAPERLV P I I 2260 

6781 GAG GGT GGC TCA TAC CGC GTG TGG TCA GAC ACA CGG GAC CTG GTG CTG GAT GGG AGC GAG 6840 

2261 EGGSYRVWSOTRDLVLOGSE 2280 

6841 TCC TAC GAC CCC AAC CTG GAG GAC GGC GAC CAG ACG CCG CTC ACT TTC CAC TGG GCC TGT 6900 

2281 SYDPNIEDGDQ TPISFHWAC 2300 

6901 GTG GCT TCG ACA CAG AGG GAG GCT GGC GGG TGT GCG CTG AAC TTT GGG CCC CGC GGG AGC 6960 

2301 VAS T ORE AG GG ALNFG PRG S 2320 

6961 AGC ACG GTC ACC ATT CCA CGG GAG CGG CTG GCG GCT GGC GTG GAG TAC Aft TTC AGC CTG 7020 

2321 S.TVT 1PRERLAAGVEYTFSL 2340 

7021 ACC GTG TGG AAG GCC GGC CGC AAG GAG GAG GCC ACC AAC CAG ACG GTG CTG ATC CGG ACT 7080 

2341 TVWKAGRKEEATNOTVL I R S 2360 

7081 GGC CGG GTG CCC ATT GTG TCC TTG GAG TGT GTG TCC TGC AAG GCA CAG GCC GTG TAC GAA 7140 

2361 G R V P I VSLECVSCKAQAVYE 2380 

7141 GTG AGC CGC AGC TCC TAC GTG TAC TTG GAG GGC CGC TGC CTC AAT TGC AGC AGC GGC TCC 7200 

2381 VSRSSY VYLEGRCLNCSSGS 2400 
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7201 AAG CGA GGG CGG TGG GCT GCA CGT ACG TTC AGC AAC AAG ACG CTG GTG CTG GAT GAG ACC 7260 

2401 KRGRWAARTFSNK TLVLDET 2420 

7261 ACC ACA TCC ACG GGC ACT GCA GGC ATG CGA CTG CTG CTG CGG CGG GGC GTG CTG CGG GAC 7320 

2421 TTSTGSAGMRL-.Vi.RRG VI R'D 2440 

7321 GGC GAG GGA TAC ACC TTC ACG CTC ACG GTG CTG GGC CGC TCT GGC GAG GAG GAG GGC TGC 7380 

2441 GEGYT F TL TVL'G RS-GE E EG C 2460 

7381 GCG-TCC ATC CGC CTG TCC CCC AAC CGC CCG CCG CTG GGG GGC TCT TGC CGC CTC TTC CCA 7440 

2461 A SIRLSPNRPPLGGSCRLFP 2480 

7441 CTG GGC GCT GTG CAC GCC CTC ACC ACC AAG GTG CAC TTC GAA TGC ACG GGC TGG CAT GAC 7500 

2481 L GAVH A L T T KVHF E C TGWH D 2500 

7501 GCG GAG GAT GCT GGC GCC CCG CTG GTG TAC GCC CTG CTG CTG CGG CGC TGT CGC CAG GGC 7560 

2501 AED.AGAPLVYALL LRRCROG 2520 

7561 CAC TGC GAG GAG TTC TGT GTC TAC AAG GGC AGC CTC TCC AGC TAC GGA GCC GTG CTG CCC 7620 

2521 HCE EFCVYKG SLSSYGAVLP 2540 

7621 CCG GGT TTC AGG CCA CAC TTC GAG GTG GGC CTG GCC GTG GTG GTG CAG GAC CAG CTG GGA 7680 

2541 PGFRPHFEVGLAVVVQDQLG 2560 

7681 GCC GCT GTG GTC GCC CTC AAC AGG TCT TTG GCC ATC ACC CTC CCA GAG CCC AAC GGC AGC 7740 

2561 A A V V A L N RSLA I T LPEPNG S 2580 

7741 GCAIaCG GGG CTC ACA GTC TGG CTG CAC GGG CTC fiCC GCT ACT GTG CTC CCA GGG CTG CTG 7800 

2581 A T GLTVWLHGL TASVLPGL L 2600 

7801 CGG CAG GCC GAT CCC CAG CAC GTC ATC GAG TAC TOG TTG GCC CTG GTC ACC GTG CTG AAC 7860 

2601 R Q A 0 P Q H V I E Y S L A L V T V L N 2620 

7861 CAG TAC GAG CGG GCC CTG GAC GTG GCG GCA GAG CCC AAG CAC GAG CGG CAG CAC CGA GCC 7920 

2621 EYERALOVAAEPKHERQHRA 2640 

7921 CAG ATA CGC AAG AAC ATC ACG GAG ACT CTG GTG TCC CTG AGG GTC CAC ACT GTG GAT GAC 7980 

2641 Q IRKN I T ETLVSL RVHTVOO 2660 
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7981 ATC CAG CAG ATC GCT GCT GCG CTG GCC CAG TGC ATG GGG CCC AGC AGG GAG CTC GTA IGC 8040 

2661 1 QQIA AALAQCM G' P S R E I V C 2680 

8041 CGC TCG TGC CTG AAG CAG ACG CTG CAC AAG CTG GAG GCC ATG ATG CTC ATC CTG CAG GCA 8100 

2681 RSCLKQT LHKIEAMML ! L 0 A 2700 

8101 GAG ACC ACC GCG GGC ACC GTG ACG CCC ACC GCC ATC GGA GAC AGC ATC CTC AAC ATC ACA 8160 

2701 ETTA GTVTPTAlGD SItNIT 2720 

8161 GGA GAC CTC ATC CAC CTG GCC AGC TCG GAC GTG CGG GCA CCA CAG CCC TCA GAG CTG GGA 8220 

2721 G OL 1HLASSDVRAPQPSELG 2740 

8221 GCC GAG TCA CCA TCT CGG ATG GTG GCG TCC CAG GCC TAC AAC CTG ACC TCT GCC CTC ATG 8280 

2741 AESPSRMVASOAYNLTSALM 2760 

8281 CGC ATC CTC ATG CGC TCC CGC GTG CTC AAC GAG GAG CCC CTG ACG CTG GCG GGC GAG GAG 8340 

2761 R I L MRSRVL NE E P L T L AG £ E 2780 

8341 ATC GTG GCC CAG GGC AAG CGC TCG GAC CCG CGG AGC CTG CTG TGC TAT GGC GGC GCC CCA 8400 

2781 I VAQ GKRSOPRSL LCYGGAP 2800 

8401 GGG CCT GGC TGC CAC TTC TCC ATC CCC GAG GCT TTC AGC GGG GCC CTG GCC AAC CTC ACT 8460 

2801 GPG CHF S I P E AF SGAL A N L 5 2820 

8461 GAC GTG GTG CAG CTC ATC TTT CTG GTG GAC TCC AAT CCC TTT CCC TTT GGC TAT ATC AGC 8520 

2821 OVVQL IF LVOSNP FPFGYI S 2840 

8521 AAC TAC ACC GTC TCC ACC AAG GTG GCC TCG ATG GCA TTC CAG ACA CAG GCC GGC GCC CAG 8580 

2841 N Y TVSTKVASM'AFQTQAGAQ 2860 

8581 ATC CCC ATC GAG CGG CTG GCC TCA GAG CGC GCC ATC ACC GTG AAG GTG CCC AAC AAC TCG 8640 

2861 IPIERLASERA1 TVKVPNNS 2880 

8641 GAC TGG GCT GCC CGG GGC CAC CGC AGC TCC GCC AAC TCC GCC AAC TCC GTT GTG GTC CAG 8700 

2881 DWAARGHRSSANSANSVVVQ 2900 

8701 CCC CAG GCC TCC GTC GGT GCT GTG GTC ACC CTG GAC AGC AGC AAC CCT GCG GCC GGG CTG 8760 

2901 PQASVGAVVT LOSSNPAAGL 2920 
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8751 CAT CTG CAG CTC AAC TAT ACG CTG CTG GAC GGC CAC TAC CTG TCT GAG GAA CCT GAG CCC 8820 
2921 H L 0 L N Y T L L 0 G H Y L S E £ P E P 2940 

8821 TAC CTG GCA GTC TAC CTA CAC TCG GAG CCC CGG CCC AAT GAG CAC AAC TGC TCG GCT AGC 8880 
2941 YLAVYLHSEPRPNEHNCSAS 2960 

8881 AGG AGG ATC CGC CCA GAG TCA CTC CAG GGT GCT GAC CAC CGG CCC TAC ACC TTC TTC ATT 8940 
2961 RRIRPESLQGAOHRP YTFFI 2980 

8941 4CC CCC GGG AGC AGA GAC CCA GCG GGG AGT TAC CAT CTG AAC CTC TCC AGC CAC TTC CGC 9000 
2981 SPGSROPAGSYHL NL SSH F R 3000 

9001 TGG TCG GCG CTG CAG GTG TCC GTG GGC CTG TAC ACG TCC CTG TGC CAG TAC TTC AGC GAG 9060 
3001 WSALQVSVGLYTSLCQYFSE 3020 

9061 GAG GAC ATG GTG TGG CGG ACA GAG GGG CTG CTG CCC CTG GAG GAG ACC TCG CCC CGC CAG 9120 
^021 EDMVWRTEGLLPLEETSPRQ 3040 

9121 GCC GTC TGC CTC ACC CGC CAC CTC ACC GCC TTC GGC GCC AGC CTC TTC GTG CCC CCA AGC 9180 
3041 AVCl TRHL TAFGASL FVPPS 3060 

9181 CAT GTC CGC TTT GTG TTT CCT GAG CCG ACA GCG GAT GTA AAC TAC ATC GTC ATG CTG ACA 9240 
3061 HVRFVFPEPTAOVNYIVMLT 3080 

9241 TGT GCT GTG TGC CTG GTG ACC TAC ATG GTC ATG GCC GCC ATC CTG CAC AAG CTG GAC CAG 9300 
3081 CAVCLVTYMVM A A I LHKLOO 3100 

9301 JTG GAT GCC AGC CGG GGC CGC GCC ATC CCT TTC TGT GGG CAG CGG GGC CGC TTC AAG TAC 9360 
3101 1 D A S R G R A I P F C G Q R G R F K Y 3120 

9361 GAG ATC CTC GTC AAG ACA GGC TGG GGC CGG GGC TCA GGT ACC ACG GCC CAC GTG GGC ATC 9420 
3121 EllVKTGWGRGSGTTAHVGI 3140 

9421 ATG CTG TAT GGG GTG GAC AGC CGG AGC GGC CAC CGG CAC CTG GAC GGC GAC AGA GCC TTC 9480 
3H1 MLYGVOSRSGHRHLOGORAF 3160 

9481 CAC CGC AAC AGC CTG GAC ATC TTC CGG ATC GCC ACC CCG CAC AGC CTG GGT AGC GTG TGG 9540 
3161 HRNSLDIFRIATPHSLGSVW 3180 

9541 AAG ATC CCA GTG TGG CAC GAC AAC AAA GGG CTC AGC CCT GCC TGG TTC CTG CAG CAC GTC 9600 
3181 KIRVWHONKGLSPAWFLOHV 3200 
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9601 ATC GTC AGG GAC CTG CAG ACG GCA CGC AGC GCC TTC TTC CTG GTC AAT GAC TGG CTT TCG 9660 

3201 IVROLQTARSAFFLVNDWL S 3220 

9661 GTG GAG ACG GAG GCC AAC GGG GGC CTG GTG GAG AAG GAG GTG CTG GCC GCG AGC GAC GCA 9720 

3221 VE TE ANGG I VE KEVL AASD A 3240 

9721 GCC CTT TTG CGC TTC CGG CGC CTG CTG GTG GCT GAG CTG CAG CGT GGC TTC TTT GAC AAG 9780 

3241 ALIRFRRLLVAE LQRGFFDK 3260 

9781 CAC ATC TGG CTC TCC ATA TGG GAC CGG CCG CCT CGT AGC CGT TTC ACT CGC ATC CAG AGG 9840 

3261 HIWLSIWDRPPRSRFTRIQR 3280 

9841 GCC ACC TGC TGC GTT CTC CTC ATC TGC CTC TTC CTG GGC GCC AAC GCC GTG TGG TAC GGG 9900 

3281 ATCC VLLICLFLGAN AVW YG 3300 

9901 GCT GTT GGC GAC TCT GCC TAC AGC ACG GGG CAT GTG TCC AGG CTG AGC CCG CTG AGC GTC 9960 

3301 AVGDSAYSTGHVSRLSPL SV 3320 

9961 GAC ACA GTC GCT GTT GGC CTG GTG TCC AGC GTG GTT GTC TAT CCC GTC TAC CTG GCC ATC 10020 

3321 D T V A V G L V S S V V V Y P V Y L A I 3340 

10021 CTT TTT CTC TTC CGG ATG TCC CGG AGC AAG GTG GCT GGG AGC CCG AGC CCC ACA CCT GCC 10080 

3341 LFLFRMSRSKVAGSPSPTPA 3360 

10081 GGG CAG CAG GTG CTG GAC ATC GAC AGC TGC CTG GAC TCG TCC GTG CTG GAC AGC TCC TTC 10140 

3361 G Q Q V L 0 I D S C L 0 S S V L D S S F 3380 

10141 CTC ACG TTC TCA GGC CTC CAC GCT GAG CAG GCC TTT GTT GGA CAG ATG AAG AGT GAC TTG 10200 

3381 LTFSGLHAEOAFVGO MKSDL 3400 

10201 TTT CTG GAT GAT TCT AAG AGT CTG GTG TGC TGG CCC TCC GGC GAG GGA ACG CTC AGT TGG 10260 

3401 FLD0SKSLVCWPSGEGTLSW 3420 

10261 CCG GAC CTG CTC AGT GAC CCG TCC ATT GTG GGT AGC AAT CTG CGG CAG CTG GCA CGG GGC 10320 

3421 P DLLSOPS IYGSNLRQLARG 3440 

10321 CAG GCG GGC CAT GGG CTG GGC CCA GAG GAG GAC GGC TTC TCC CTG GCC AGC CCC TAC TCG 10380 

3441 QAGHGLGP EEDGFSLASPYS 3460 

10381 CCT GCC AAA TCC TTC TCA GCA TCA GAT GAA GAC CTG ATC CAG CAG GTC CTT GCC GAG GGG 10440 

3461 PAKSF SASDE 0 L I QQVL AE G 3480 
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10441 GTC AGC AGC CCA CCC CCT ACC CAA GAC ACC CAC ATC GAA ACG GAC CTG CTC AGC AGC CTG 10500 

3481 VSSPAPTQDTHMEvTOL L SSL 3500 

10501 TCC AGC ACT CCT GGG GAG AAG ACA GAG ACG CTG GCG CTG CAG AGG CTG GGG GAG CTG GGG 10560 

3501 SST PGEK7 E T L A L ORLGE L G 3520 

10561 CCA CCC AGC CCA GGC CTG AAC TGG GAA CAG CCC CAG GCA GCG AGG CTG TCC AGG ACA GGA 10620 

3521 PPSPGLNWEQPQAARLSRTG 3540 

10621 CTG"GTG GAG GGT CTG CGG AAG CGC CTG CTG CCG GCC TGG TGT GCC TCC CTG GCC CAC GGG 10680 

3541 LVEGLRKRLLPAWCASLAHG 3560 

10681 CTC AGC CTG CTC CTG GTG GCT GTG GCT GTG GCT GTC TCA GGG TGG GTG GGT GCG AGC TTC 10740 

3561 LSLLLVAVAVAVSGWVGASF 3580 

10741 CCC CCG GGC GTG ACT GTT GCG TGG CTC CTG TCC AGC AGC GCC AGC TTC CTG GCC TCA TTC 10800 

3581 PPGVSVAWLLSSSA-SFLASF 3600 

10801 CTC GGC TGG GAG CCA CTG AAG GTC TTG CTG GAA GCC CTG TAC TTC TCA CTG GTG GCC AAG 10860 

3601 LG WEPLKVLIEALYFSIVAK 3620 

10861 CGG CTG CAC CCG GAT GAA GAT GAC ACC CTG GTA GAG AGC CCG GCT GTG ACG CCT GTG AGC 10920 

3621 RlHPDEDDTl VESPAVTPVS 3640 

10921 GCA CGT GTG CCC CGC GTA CGG CCA CCC CAC GGC TTT GCA CTC TTC CTG GCC AAG GAA GAA 10980 

3641 ARVPRVRPPHGFA L F LAK E E 3660 

10981 GCC CGC AAG GTC AAG AGG CTA CAT GGC ATG CTG CGG AGC CTC CTG GTG TAC ATG CTT TTT 1 1040 

3661 AR KVKRLHGMLRSLLVYMLF 3680 

11041 CTG CTG GTG ACC CTG CTG GCC AGC TAT GGG GAT GCC TCA TGC CAT GGG CAC GCC TAC CGT 11100 

3681 LLVTLLASYGDASCHGHAYR 3700 

11101 CTG CAA AGC GCC ATC AAG CAG GAG CTG CAC AGC CGG GCC TTC CTG GCC ATC ACG CGG TCT 11 160 

3701 LQSA IKQELHSRAFLAI TRS 3720 

11161 GAG GAG CTC TGG CCA TGG ATG GCC CAC GTG CTG CTG CCC TAC GTC CAC GGG AAC CAG TCC 11220 

3721 EELWPWMAHVL LPYVHGNQS 3740 

11221 AGC CCA GAG CTG GGG CCC CCA CGG CTG CGG CAG GTG CGG CTG CAG GAA GCA CTC TAC CCA 11280 

3741 SPELGPPRLRQVRLQEALYP 3760 
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11281 GAC CCT CCC GGC CCC AGG GTC CAC ACG TGC TOG GCC GCA GGA GGC TTC AGC ACC AGC GAT 1 1340 

3761 DP PGPRVHTCSAAGGF ST SO 3780 

1 1341 TAC GAC GTT GGC TGG GAG AGT CCT CAC AAT GGC TCG GGC ACG TGG GCC TAT TCA GCG CCG 1 1400 

3781 YDVGWESP HNGSGTWAYSAP 3800 

11401 GAT CTG CTG GGG GCA TGG TCC TGG GGC TCC TGT GCC GTG TAT GAC AGC GGG GGC TAC GTG 1 1460 

3801 DlLG AWSWGSCAVYDS GGYV 3820 

11461 CAG GAG CTG GGC CTG AGC CTG GAG GAG AGC CCC GAC CGG CTG CGC TTC CTG CAG CTG CAC 11520 

3821 QE LGl SI EE S RDRIRF I 0 ; I H 3840 

11521 MC TGG CTG GAC AAC AGG AGC CGC GCT GTG TTC CTG GAG CTC ACG CGC TAC AGC CCG GCC 11580 

3841 NWLDN RSRAVFLELTRYSPA 3860 

1 1581 GTG GGG CTG CAC GCC GCC GTC ACG CTG CGC CTC GAG TTC CCG GCG GCC GGC CGC GCC CTG 11640 

3861 VGL HAAVTL R L EFPAAGR AL 3880 

11641 GCC GCC CTC AGC GTC CGC CCC TTT GCG CTG CGC CGC CTC AGC GCG GGC CTC TCG CTG CCT 11700 

3881 AAL SVRPFAL RRLS AGL SLP 3900 

1 1701 CTG CTC ACC TCG GTG TGC CTG CTG CTG TTC GCC GTG CAC TTC GCC GTG GCC GAG GCC CGT 11760 

3901 L L T S V C L I L F A V H F A V A E A R 3920 

11761 ACT TGG CAC AGG GAA GGG CGC TGG CGC GTG CTG CGG CTC GGA GCC TGG GCG CGG TGG CTG 11820 

3921 TWHRECRWRVL R I GAWARWI 3940 

1 1821 CTG GTG GCG CTG ACG GCG GCC ACG GCA CTG GTA CGC CTC GCC CAG CTG GGT GCC GCT GAC 11880 

3941 L V A L T A A T A L V R I A Q L G A A 0 3960 

11881 CGC CAG TGG ACC CGT TTC GTG CGC GGC CGC CCG CGC CGC TTC ACT AGC TTC GAC CAG GTG 11940 

3961 RQWTRFVRGRPRRFTSFDQV 3980 

11941 GCG CAC GTG AGC TCC GCA GCC CGT GGC CTG GCG GCC TCG CTG CTC TTC CTG CTT TTG GTC 12000 

3981 AHVSS AARG L AASLLF L L L V 4000 

12001 AAG GCT GCC CAG CAC GTA CGC TTC GTG CGC CAG TGG TCC GTC TTT GGC AAG ACA TTA TGC 12060 

4001 K A A Q H V R f V R Q W S V F G K T L C 4020 

12061 CCA GCT CTG CCA GAG CTC CTG GGG GTC ACC TTG GGC CTG GTG GTG CTC GGG GTA GCC TAC 12120 

4021 RALPELLGVTLGLVVLGVAY 4040 
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12121 GCC CAG CTG GCC ATC CTG CTC GTG TCT TCC TGT GTG GAC TCC CTC TGG AGC GTG GCC CAG 12180 
4041 AQLAILLVSSCV,QSLWSVAQ 4060 

12181 GCC CTG TTG GTG CTG TGC CCT GGG ACT GGG CTC TCT ACC CTG TGT CCT GCC GAG TCC TGG 12240 
4061 ALLVLC PGTGL STLCPAESW 4080 

12241 CAC CTG TCA CCC CTG CTG TGT GTG GGG CTC TGG GCA CTG CGG CTG TGG GGC GCC CTA CGG 12300 

4081 HLSPILCVGIWALRLWGALR 4100 

12301 ItG GGG GCT GTT ATT CTC CGC TGG CGC TAC CAC GCC TTG CGT GGA GAG CTG TAC CGG CCG 12360 

4101 L GAV { L RWRYHAL RGElYRP 4120 

12361 GCC TGG GAG CCC CAG GAC TAC GAG ATG GTG GAG TTG TTC CTG CGC AGG CTG CGC CTC TGG 12420 

4121 AWEPQDYEMVELFLRRLRLW 4140 

12421 ATG GGC CTC AGC AAG GTC AAG GAG TTC CGC CAC AAA GTC CGC TTT GAA GGG ATG GAG CCG 12480 

4141 MGLSKVKEFRHKV R FEGMEP 4160 

12481 CTG CCC TCT CGC TCC TCC AGG GGC TCC AAG GTA TCC CCG GAT GTG CCC CCA CCC AGC GCT 12540 

4161 LPSRSSRGSKVSPDVPPPSA 4180 

12541 GGC TCC GAT GCC TCG CAC CCC TCC ACC TCC TCC AGC CAG CTG GAT GGG CTG AGC GTG AGC 12600 

4181 GSO ASHPSTSSSQLOGLSVS 4200 

12601 CTG GGC CGG CTG GGG ACA AGG TGT GAG CCT GAG CCC TCC CGC CTC CAA GCC GTG TTC GAG 12660 

4201 L GRLGTRCEPEP SRLQAVF E 4220 

12661 CCC CTG CTC ACC CAG TTT GAC CGA CTC AAC CAG GCC ACA GAG GAC GTC TAC CAG CTG GAG 12720 

4221 L ITQFDRINQATEDVYQLE 4240 

12721 CAG CAG CTG CAC AGC CTG CAA GGC CGC AGG AGC AGC CGG GCG CCC GCC GGA TCT TCC CGT 12780 

4241 QQLHSLQCRRSSRAPAGSSR 4260 

12781 GGC CCA TCC CCG GGC CTG CGG CCA GCA CTG CCC AGC CGC CTT GCC CGG GCC ACT CGG GGT 12840 

4261 GPSPGLRPALPSRLARASRG 4280 

12841 GTG GAC CTG GCC ACT GGC COC AGC AGG ACA CCC CTT OGG GCC AAC AAC AAG GTC CAC (XX 12900 

4281 VDlATGPSRTPLRAKNKVH P 4300 

12901 AGC AGC ACT TAG 

4301 S S T • 
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1 MPPAAPARLA LALGLGLWLG ALA GGPGRGC C PCEPPCLCC PAPGAACRVN CSCRGLRTLC PALR1 PADAT ELDVSHNUj 80 

SIGNAL PEPTIOE LRR CYTEINE-R1CH AMINO TERMNUS 

81 ALOVCLLANl SALA EL D1SN NKISTLEEGl FANLFNLSE I NL SGNPFECD^CGLAEPQWA EEQQVRWQP EAATCAGPGS 160 

LRR1 LRR2 LRR CYTEINE-RICH CARBOXY TERMNUS 

161 LACQPLLCIP LLDSGCG EEY VACLPONSSG TVAAVSFSAA HEGLLQPEAC SAFCGSTGQG LAALSEOGC LCGAAQPSSA 240 
241 SFACLSLCSG PPAPPAPTCR GPTLLQHVFP ASPGATLVGP H GPLASGQLA AFH1AAPLPV TOTRWDFGDG SAEVDAAGPA 320 

PK01 R1 

321 ASHRYVLPGR YHVTAVLALG AGSALLGTDV QVEAAPAALE LVCPSSVQSO ESLDLSIQNR GGSGLEAAYS IVALGEEPAR 400 

401 AVHP LCPSDT E1FPGNGHCY RLWEKAAWL QAQEQCQAWA GAALAMVDSP AVQRFLVSRV TRSLDVWiCF- STVQGVEVGP 480 

C-TYPE LECTIN BINDING 00MA1N 
481 A PQGEAFSLE SCONWIPGEP HPATAEHCVR LGPTCCNTO LCSAPHSYVC ELQP GGPVQO AENLLVGAPS GOLQGPLTPL 560 

561 AOQOGLSAPH EPVEVMVFPG LRLSREAFLT IAEFGTQELR RPAQLRLQVY RLLSTAGTPE NGSEPESRSP DNRTQIAPAC 640 

641 MPGGRCPGA N1CLPLDASC HPQACANGCT S GPGLPGAPY ALWREFLFSV PAGPPAQYSV TLHGQOVLML PGDLVGLOHO 720 
LOL-A 

721 AGPGALLHCS PAPGHPGPRA PYLSANASSW LPHLPAQLEG TWGCPACALR LLAQREQLTV LLGLRPNPGL RLPGRYEVRA 800 

801 EVGNGVSRHN LSCSFDWSP VAGLRVIYPA PROGRLYVPT NGSALVLQVD SGANATATAR WPGGSLSARF ENVCPALVAT 880 

881 FVPACPWETN DTLFSWALP WLSEGEHWD VWENSASRA NLSLRVTAEE PICGLRATPS PEARVLQGVL VRYSPWEAG 960 

961 SDMVFRWTIN DKQSLTFQNV VFNVIYQSAA VFKLSLTASN HVSNVTVNYN VTVERMNRMQ GLQVSTVPAV LSPNATLAiT 1040 

1041 ACVLVOSAVE VAFLWTFGDG EQALHQFQPP YNESFPVPOP SVAQVLVEHN VTHTYAAPGE YLLTVLASNA FENLTQQVPV 1120 

1121 SVRASLPSVA VGVSOGV LVA GRPVTFYW LPSPGGVLYT. TOCSPVL TQSQPAANHT YASRGTYHVR LEVNNTVSGA 1200 

PK01 R3 

1201 AAQADVRVF E ELRGLSVDMS LAVEQGAPW VSAAVQTGON 1TWTF0MGDG TVLSGPEATV EHVYLRAQNC TVTVGAGSPA 1280 

PKD1 R4 

1281 GHLARSLHVL VF VLEVLRVE PAAC 1PTQPD ARLTAYVTGN PAHYLFOWTF GDGSSNTTVR GCPTVTHNFT RSGTFPLALV 1360 

PKD1 R5 

1361 LSSRVNRAHY FTSICVEP EV GNVTLQPERQ FVQLGDEAWL VACAWPPFPY RYTWDFGTEE AAPTRARGPE VTF1YRDPGS 1440 

PK01 R6 

1441 YLVTVTASNN iSAANDSALV EVQ EPVLVTS 1KVNGS LGLE LQQPYLFSAV GRGRPASYLW DLGOGGWLEG PEVTHAYNST 1520 

PKD1R7 

1521 GDFTVRVAGW NEVSRSEAWL NVTVK RRVRG LWNASR TW PLNGSVSFST SLEAGSDVRY SWVLCORCTP IPGGPT1SYT 1600 

PKD1 R8 

1601 FRSVGTFN1 1 VTAENEVGSA QDS1FVYVL Q L1EGLQWGG GR YFPTNHTV QLQAWROGT NVSYSWTAWR DRGPALAGSG 1680 

PKD1 R9 

1681 KGFSLTVLEA CTYHVQLRAT MiCGSAHAOC T>CFV EPVGW LMVAASPN PA AVNTSVTLSA ELACCSGWY TWSLEEGLSW 1760 

PKD1 RIO 

1761 ETSEPFTTHS FPTPGLHLVT MTAGNPLGSA NATVEVDVQ V PVSGLS1RAS EPGG SFVAAG SSVPFWGQLA TGTNVSCWA 1840 

PKD1 R11 
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1841 VPCGSSKRCP HVTMVFPDAC TFS1RLNASN AVSWVSATYN LTAEE P1VCI VLWASS KWA PGQLVHFQIL LAACSAVTFR 1920 

PK01 R12 

1921 [QVCCANPEV LPGPRFSHSF PRVGDHWSV RGKNHVSWAQ AQVRIWL EA VSGLOVPNCC E PC1ATCIER NFTARVQRCS 2000 

PKD1 R13 

2001 RVAYAWYFSL QKVQGDSLV1 LSGROVTYTP VAAGLLE1QV RAFNALGSEN RTLVLEVQ DA VQYVAIQSGP CFTNRSAQFE 2080 

2081 AATSPSPRRV AYHWDFCOGS PGQOTOEPRA EHSYLRPGDY RVQVNASNLV SFFVAQATVT VQV LACREPE VDWLPLQVL 2160 
PK01 R14 

2161 MRRSQRNYIE AHVDLROCVT YQTEYRWEVY RTASCQRPGR PARVALPGVD VSRPRLVLPR LALPVGHYCF VFWSFGOTP 2240 

2241 LTQSIQANVT VAPERLVPII EGGSYRVWSD TRDLVLOGSE SYDPNLEDGD QTPLSFHWAC VASTQREAGG CALffGPRGS 2320 

2321 STVTIPRERL AAGVEYTFSL TVWKAGRKEE ATNQTVLIRS GRVPIVSLEC VSCKAOAVYE VSRSSYVYLE GRCINCSSGS 2400 

2401 KRGRWAARTF SNKTLVLDET TTSTGSAGMR LVLRRGVLRO GEGYTFTLTV LGRSGEEEGC ASIRLSPNRP PLGGSCRLFP 2480 

2481 LGAVHALTTK VHFECTGWHO AEOAGAPIVY ALLLRRCRQG HCEEFCVYKG SLSSYGAVLP PGFRPHFEVG LAWVQDQLG 2560 

2561 AAWALNRSL A1ILPEPNGS ATGLTWLHG LTASVLPGLL RQADPQHVIE YSLALVMN EYERALOVAA EPKHERQHRA 2640 

2641 QIRKNITETL VSLRVHTVOO IQG1AAALAQ CMGPSRELVC RSCIKQTIHK LEAWUIQA ETTAGTVTPT AIGDSILN1T 2720 

2721 G0L1HLASSD VRAPQPSELG AESPSM/AS QAYNLTSALM R1LMRSRVLN EEPLTLAGEE 1VAQGKRS0P RSLLCYGGAP 2800 

2801 GPGCHFSIPE AFSGALANLS OWQLiFLVO SNPFPFGYIS NYTVSTKVAS MAFQTQAGAQ IP1ERLASER AITVKVPNNS 2880 

2881 OWAARGHRSS ANSANSVWQ PQASVGAWT LDSSNPAAGl HQLKYILLD GHYLSEEPEP YLAVYIHSEP RPNEHNCSAS 2960 

2961 RRIRPESLQG AOHRPYTFFI SPGSROPAGS YHLNLSSHFR WSALQVSVGL YTSLCQYfSE EDMVWRTEGl LPLEETSPRQ 3040 

3041 AVCLTRHLTA FGASLFVPPS HVRFVFPEPT ADVNYIVMLT CAVCLVTYMV MAAIIHKLOQ LOASRGRAIP FCGQRGRFKY 3120 

3121 EILVKTGWGR GSGTTAHVGI MLYGVDSRSG HRHLOGORAF HRNSLOIFRI ATPHSLGSVW KIRVWHDNKG LSPAWFLQHV 3200 

3201 IVRDLQTARS AFFIVNOWIS VEJEANGGLV EKEVLAASOA ALLRFRRLLV AELQRGFFOK HIW.S1WDRP PRSRFTR1QR 3280 

3281 ATCCVLUCl FIGANAVWYG AVGOSAYSTG HVSRLSPLSV OTVAVGLVSS VWYPVYLAI LFLFRMSRSK VAGSPSPTPA 3360 

3361 GQQVL01DSC LOSSVLDSSF LTFSGLHAEO AFVGQMGDL FLDOSKSLVC WPSGEGTLStt PDLLSDPS1V GSNLRQLARG 3440 

3441 QAGHGLGPEE OGFSLASPYS PAKSFSASOE DL10QVLAEG VSSPAPTQOT HCTOLISSL SSTPGEKTET LALQRLGELG 3520 

3521 PPSPGLfMEQ PQAARLSRTG LVEGLRKRLL PACASLAHG LSLLLVAVAV AV9GWVGASF PPGVSVAWLL SSSASFLASF 3600 

3601 LGKCPLKVLL EALYFSLVAK RLHPOEOOTL VESPAVTPVS ARVPRVRPPH GFALFLAKEE ARKVKRLHGM IRSLLVYMLF 3680 

3681 LLVTLLASYG OASCHGHAYR LQSA1KQELH SRAFUITRS EELWPWMAHV LLPYVH»GS SPELGPPRLR QVRLQEALYP 3760 

3761 DPPGPRVHTC SAAGGFSTS) YDOESPHN GSGTWAYSAP CH.LGAWSWGS CAVYDSGGYV QELGLSLEES RORLffflQLH 3840 

3841 NWLDNRSRAV FLELTRYSPA VGLHAAVTLR LEFPAAGRAL AALSVRPFAL RRLSAGLSLP LLTSVCLLLF AVHFAVAEAR 3920 

3921 TWHREGRWRV LRLGAWARWL IVALTAATAL VRLAQLGAAO RQWTRFVRGR PRRFTSFDQV AHVSSAARGL AASLLFLLLV 4000 

4001 KAAQHVRFVR QWSVFGKTLC RALPELLCVT LGLWLGVAY AQLAILLVSS CVDSLWSVAQ ALLVLCPGTG LSTLCPAESW 4080 

4081 HLSPLLCVGL WALRLCALR LGAV1LRWRY HAIRGELYRP AWEPGOYEMV ELFIRRLRIW UGLSKVKEFR H(VRFEG»€P 4160 

4161 LPSRSSRGSK V5PDVPPPSA GSDASHPSTS SSQLOGISVS LGRLGTRCEP EPSRLQAVFE ALLTQFORLN QATEDVYQLE 4240 

4241 QQLHSLQGRR SSRAPAGSSR GPSPGLRPAL PSRLARASRG VDIATGPSRT PLRAKNKVHP SSTZ 4304 
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